Programmatic vs CLI Usage
Arena can be used in two ways: as a command-line tool or as a Go library. This guide explains the differences, trade-offs, and when to choose each approach.
Overview
Section titled “Overview”CLI Approach
Section titled “CLI Approach”promptarena run config.arena.yamlWhat it is: Using Arena as a standalone command-line tool with YAML configuration files.
Programmatic Approach
Section titled “Programmatic Approach”eng, _ := engine.NewEngine(cfg, providerReg, promptReg, mcpReg, executor)runIDs, _ := eng.ExecuteRuns(ctx, plan, 4)What it is: Importing Arena as a Go library and controlling it programmatically.
Decision Matrix
Section titled “Decision Matrix”| Use Case | CLI | Programmatic | Why |
|---|---|---|---|
| Quick manual testing | ✅ Best | ⚠️ Overkill | CLI is faster for ad-hoc testing |
| CI/CD pipelines | ✅ Good | ✅ Good | Both work well, CLI is simpler |
| Integration with apps | ❌ Limited | ✅ Best | Need programmatic control |
| Dynamic test generation | ❌ Hard | ✅ Best | Can’t easily generate YAML |
| Custom result processing | ⚠️ Via scripts | ✅ Best | Direct access to results |
| Team collaboration | ✅ Best | ⚠️ Harder | YAML files are git-friendly |
| Complex workflows | ⚠️ Limited | ✅ Best | Need conditional logic |
| Learning Arena | ✅ Best | ⚠️ Harder | CLI has gentler learning curve |
CLI: When to Use
Section titled “CLI: When to Use”✅ Good For:
Section titled “✅ Good For:”1. Manual Testing and Exploration
Section titled “1. Manual Testing and Exploration”# Quick iteration on promptspromptarena run --scenario greeting-testpromptarena run --provider claudepromptarena run --verboseWhy: Immediate feedback, no compilation needed.
2. Standard CI/CD Integration
Section titled “2. Standard CI/CD Integration”# .github/workflows/test.yml- name: Run Arena tests run: promptarena run --ci --format junitWhy: Simple, declarative, version-controlled configuration.
3. Team Collaboration
Section titled “3. Team Collaboration”# config.arena.yaml (committed to git)scenarios: - file: scenarios/customer-support.yaml - file: scenarios/edge-cases.yamlWhy: Everyone uses the same config, changes are tracked, reviews are easy.
4. Interactive TUI
Section titled “4. Interactive TUI”# Watch tests execute in real-timepromptarena runWhy: Beautiful terminal UI with progress, logs, and results.
❌ Not Good For:
Section titled “❌ Not Good For:”Dynamic Test Generation
Section titled “Dynamic Test Generation”# Can't do this with CLI:# Generate 1000 test scenarios from database# Run tests based on runtime conditionsCustom Result Processing
Section titled “Custom Result Processing”# Limited to built-in outputs:# - JSON, HTML, JUnit, Markdown# Can't easily pipe to custom analyticsApplication Integration
Section titled “Application Integration”# Can't embed in your app:# Have to shell out and parse outputsystem("promptarena run ...")Programmatic: When to Use
Section titled “Programmatic: When to Use”✅ Good For:
Section titled “✅ Good For:”1. Integration with Applications
Section titled “1. Integration with Applications”// Embed testing in your appfunc validateUserPrompt(prompt string) error { cfg := buildTestConfig(prompt) eng, _ := setupEngine(cfg) results := runTests(eng) return validateResults(results)}Why: Direct integration, no external processes.
2. Dynamic Test Generation
Section titled “2. Dynamic Test Generation”// Generate scenarios from datascenarios := make(map[string]*config.Scenario)for _, conversation := range loadFromDB() { scenarios[conversation.ID] = buildScenario(conversation)}cfg.LoadedScenarios = scenariosWhy: Can generate tests programmatically.
3. Custom Workflows
Section titled “3. Custom Workflows”// Complex conditional logicfor attempt := 0; attempt < maxRetries; attempt++ { results := runTests(eng) if allPassed(results) { break } adjustConfig(cfg, results)}Why: Full control over execution flow.
4. Custom Result Processing
Section titled “4. Custom Result Processing”// Send metrics to your systemfor _, result := range results { metrics := extractMetrics(result) sendToDatadog(metrics) updateDashboard(metrics)}Why: Direct access to result objects.
5. Building Testing Tools
Section titled “5. Building Testing Tools”// Custom testing frameworkstype TestRunner struct { engine *engine.Engine}
func (r *TestRunner) RunWithRetry(...)func (r *TestRunner) CompareProviders(...)func (r *TestRunner) GenerateReport(...)Why: Build specialized tools on top of Arena.
❌ Not Good For:
Section titled “❌ Not Good For:”Quick Iteration
Section titled “Quick Iteration”// Every change requires:// 1. Edit code// 2. Recompile// 3. Run// vs CLI: just edit YAML and runTeam Onboarding
Section titled “Team Onboarding”// Steeper learning curve:// - Need to know Go// - Understand API// - Set up dev environment// vs CLI: just install and runHybrid Approach
Section titled “Hybrid Approach”You can combine both approaches:
Pattern: CLI for Standard Tests, Code for Custom
Section titled “Pattern: CLI for Standard Tests, Code for Custom”# Standard regression tests (CLI)promptarena run regression-tests.yaml --ci
# Custom dynamic tests (programmatic)go run custom-tests/main.goPattern: Generate Config, Run with CLI
Section titled “Pattern: Generate Config, Run with CLI”// Generate config filecfg := buildConfigFromData()saveToYAML(cfg, "generated.arena.yaml")
// Run with CLIexec.Command("promptarena", "run", "generated.arena.yaml").Run()Pattern: CLI for Development, Programmatic for Production
Section titled “Pattern: CLI for Development, Programmatic for Production”# During developmentpromptarena run --verbose
# In production CI/CD# Use Go binary with custom logic./custom-test-runnerMigration Path
Section titled “Migration Path”From CLI to Programmatic
Section titled “From CLI to Programmatic”If you outgrow the CLI:
-
Keep existing YAML configs
// Load existing configseng, _ := engine.NewEngineFromConfigFile("config.arena.yaml") -
Gradually add programmatic logic
// Start with file loadingeng, _ := engine.NewEngineFromConfigFile("config.arena.yaml")// Add custom processingresults := executeAndProcess(eng) -
Eventually go fully programmatic
// Build config in codecfg := buildConfig()eng, _ := setupEngine(cfg)
Performance Considerations
Section titled “Performance Considerations”- Startup time: ~100-500ms (Go binary startup)
- Overhead: Minimal (one process)
- Memory: Isolated per execution
Programmatic
Section titled “Programmatic”- Startup time: 0ms (in same process)
- Overhead: None (native function calls)
- Memory: Shared with application
Winner: Programmatic for high-frequency testing.
Debugging
Section titled “Debugging”# Easy debuggingpromptarena run --verbose --log-level debugpromptarena debug --config arena.yamlpromptarena config-inspectProgrammatic
Section titled “Programmatic”// Standard Go debugginglog.Printf("Config: %+v", cfg)err := eng.ExecuteRuns(ctx, plan, 4)log.Printf("Results: %+v", results)Winner: CLI for quick troubleshooting, programmatic for deep debugging.
Recommendations
Section titled “Recommendations”Start with CLI if:
Section titled “Start with CLI if:”- ✅ You’re new to Arena
- ✅ You have standard testing needs
- ✅ Your tests are stable and predefined
- ✅ You want quick results
- ✅ Your team isn’t familiar with Go
Use Programmatic if:
Section titled “Use Programmatic if:”- ✅ You need to integrate with applications
- ✅ You need dynamic test generation
- ✅ You need custom result processing
- ✅ You’re building testing tools
- ✅ You have complex conditional workflows
- ✅ You need high-frequency testing
Use Both if:
Section titled “Use Both if:”- ✅ You want flexibility
- ✅ You have different use cases
- ✅ Some tests are standard, some are custom
- ✅ You want CLI for development, code for production
Real-World Examples
Section titled “Real-World Examples”E-commerce: CLI Approach
Section titled “E-commerce: CLI Approach”# Test product recommendations dailypromptarena run product-tests.yaml --ciWhy: Standard tests, CI/CD integration, team collaboration.
SaaS Platform: Programmatic Approach
Section titled “SaaS Platform: Programmatic Approach”// Test each customer's custom promptfor _, customer := range customers { cfg := buildConfigForCustomer(customer) results := runTests(cfg) notifyCustomer(customer, results)}Why: Dynamic per-customer testing, custom notifications.
AI Research: Hybrid Approach
Section titled “AI Research: Hybrid Approach”# Standard benchmarks (CLI)promptarena run benchmarks/*.yaml
# Experimental tests (programmatic)go run experiments/ablation-study.goWhy: Standard tests for baselines, code for experiments.
Summary
Section titled “Summary”| Aspect | CLI | Programmatic |
|---|---|---|
| Ease of use | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| Flexibility | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Team collaboration | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| Integration | ⭐⭐ | ⭐⭐⭐⭐⭐ |
| Custom logic | ⭐⭐ | ⭐⭐⭐⭐⭐ |
| Learning curve | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| Performance | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
Golden Rule: Start with CLI. Switch to programmatic when you hit limitations.