Skip to content

Programmatic vs CLI Usage

Arena can be used in two ways: as a command-line tool or as a Go library. This guide explains the differences, trade-offs, and when to choose each approach.

Terminal window
promptarena run config.arena.yaml

What it is: Using Arena as a standalone command-line tool with YAML configuration files.

eng, _ := engine.NewEngine(cfg, providerReg, promptReg, mcpReg, executor)
runIDs, _ := eng.ExecuteRuns(ctx, plan, 4)

What it is: Importing Arena as a Go library and controlling it programmatically.

Use CaseCLIProgrammaticWhy
Quick manual testing✅ Best⚠️ OverkillCLI is faster for ad-hoc testing
CI/CD pipelines✅ Good✅ GoodBoth work well, CLI is simpler
Integration with apps❌ Limited✅ BestNeed programmatic control
Dynamic test generation❌ Hard✅ BestCan’t easily generate YAML
Custom result processing⚠️ Via scripts✅ BestDirect access to results
Team collaboration✅ Best⚠️ HarderYAML files are git-friendly
Complex workflows⚠️ Limited✅ BestNeed conditional logic
Learning Arena✅ Best⚠️ HarderCLI has gentler learning curve
Terminal window
# Quick iteration on prompts
promptarena run --scenario greeting-test
promptarena run --provider claude
promptarena run --verbose

Why: Immediate feedback, no compilation needed.

# .github/workflows/test.yml
- name: Run Arena tests
run: promptarena run --ci --format junit

Why: Simple, declarative, version-controlled configuration.

# config.arena.yaml (committed to git)
scenarios:
- file: scenarios/customer-support.yaml
- file: scenarios/edge-cases.yaml

Why: Everyone uses the same config, changes are tracked, reviews are easy.

Terminal window
# Watch tests execute in real-time
promptarena run

Why: Beautiful terminal UI with progress, logs, and results.

Terminal window
# Can't do this with CLI:
# Generate 1000 test scenarios from database
# Run tests based on runtime conditions
Terminal window
# Limited to built-in outputs:
# - JSON, HTML, JUnit, Markdown
# Can't easily pipe to custom analytics
Terminal window
# Can't embed in your app:
# Have to shell out and parse output
system("promptarena run ...")
// Embed testing in your app
func validateUserPrompt(prompt string) error {
cfg := buildTestConfig(prompt)
eng, _ := setupEngine(cfg)
results := runTests(eng)
return validateResults(results)
}

Why: Direct integration, no external processes.

// Generate scenarios from data
scenarios := make(map[string]*config.Scenario)
for _, conversation := range loadFromDB() {
scenarios[conversation.ID] = buildScenario(conversation)
}
cfg.LoadedScenarios = scenarios

Why: Can generate tests programmatically.

// Complex conditional logic
for attempt := 0; attempt < maxRetries; attempt++ {
results := runTests(eng)
if allPassed(results) {
break
}
adjustConfig(cfg, results)
}

Why: Full control over execution flow.

// Send metrics to your system
for _, result := range results {
metrics := extractMetrics(result)
sendToDatadog(metrics)
updateDashboard(metrics)
}

Why: Direct access to result objects.

// Custom testing frameworks
type TestRunner struct {
engine *engine.Engine
}
func (r *TestRunner) RunWithRetry(...)
func (r *TestRunner) CompareProviders(...)
func (r *TestRunner) GenerateReport(...)

Why: Build specialized tools on top of Arena.

// Every change requires:
// 1. Edit code
// 2. Recompile
// 3. Run
// vs CLI: just edit YAML and run
// Steeper learning curve:
// - Need to know Go
// - Understand API
// - Set up dev environment
// vs CLI: just install and run

You can combine both approaches:

Pattern: CLI for Standard Tests, Code for Custom

Section titled “Pattern: CLI for Standard Tests, Code for Custom”
Terminal window
# Standard regression tests (CLI)
promptarena run regression-tests.yaml --ci
# Custom dynamic tests (programmatic)
go run custom-tests/main.go
// Generate config file
cfg := buildConfigFromData()
saveToYAML(cfg, "generated.arena.yaml")
// Run with CLI
exec.Command("promptarena", "run", "generated.arena.yaml").Run()

Pattern: CLI for Development, Programmatic for Production

Section titled “Pattern: CLI for Development, Programmatic for Production”
Terminal window
# During development
promptarena run --verbose
# In production CI/CD
# Use Go binary with custom logic
./custom-test-runner

If you outgrow the CLI:

  1. Keep existing YAML configs

    // Load existing configs
    eng, _ := engine.NewEngineFromConfigFile("config.arena.yaml")
  2. Gradually add programmatic logic

    // Start with file loading
    eng, _ := engine.NewEngineFromConfigFile("config.arena.yaml")
    // Add custom processing
    results := executeAndProcess(eng)
  3. Eventually go fully programmatic

    // Build config in code
    cfg := buildConfig()
    eng, _ := setupEngine(cfg)
  • Startup time: ~100-500ms (Go binary startup)
  • Overhead: Minimal (one process)
  • Memory: Isolated per execution
  • Startup time: 0ms (in same process)
  • Overhead: None (native function calls)
  • Memory: Shared with application

Winner: Programmatic for high-frequency testing.

Terminal window
# Easy debugging
promptarena run --verbose --log-level debug
promptarena debug --config arena.yaml
promptarena config-inspect
// Standard Go debugging
log.Printf("Config: %+v", cfg)
err := eng.ExecuteRuns(ctx, plan, 4)
log.Printf("Results: %+v", results)

Winner: CLI for quick troubleshooting, programmatic for deep debugging.

  • ✅ You’re new to Arena
  • ✅ You have standard testing needs
  • ✅ Your tests are stable and predefined
  • ✅ You want quick results
  • ✅ Your team isn’t familiar with Go
  • ✅ You need to integrate with applications
  • ✅ You need dynamic test generation
  • ✅ You need custom result processing
  • ✅ You’re building testing tools
  • ✅ You have complex conditional workflows
  • ✅ You need high-frequency testing
  • ✅ You want flexibility
  • ✅ You have different use cases
  • ✅ Some tests are standard, some are custom
  • ✅ You want CLI for development, code for production
Terminal window
# Test product recommendations daily
promptarena run product-tests.yaml --ci

Why: Standard tests, CI/CD integration, team collaboration.

// Test each customer's custom prompt
for _, customer := range customers {
cfg := buildConfigForCustomer(customer)
results := runTests(cfg)
notifyCustomer(customer, results)
}

Why: Dynamic per-customer testing, custom notifications.

Terminal window
# Standard benchmarks (CLI)
promptarena run benchmarks/*.yaml
# Experimental tests (programmatic)
go run experiments/ablation-study.go

Why: Standard tests for baselines, code for experiments.

AspectCLIProgrammatic
Ease of use⭐⭐⭐⭐⭐⭐⭐⭐
Flexibility⭐⭐⭐⭐⭐⭐⭐⭐
Team collaboration⭐⭐⭐⭐⭐⭐⭐⭐
Integration⭐⭐⭐⭐⭐⭐⭐
Custom logic⭐⭐⭐⭐⭐⭐⭐
Learning curve⭐⭐⭐⭐⭐⭐⭐⭐
Performance⭐⭐⭐⭐⭐⭐⭐⭐⭐

Golden Rule: Start with CLI. Switch to programmatic when you hit limitations.