Using Arena as a Go Library
In this tutorial, you’ll learn how to use Arena as a Go library instead of a CLI tool. This is useful when you want to integrate LLM testing into your own applications, build custom testing tools, or generate test scenarios dynamically.
What You’ll Build
Section titled “What You’ll Build”By the end of this tutorial, you’ll have a working Go program that:
- Creates Arena configurations programmatically
- Executes test scenarios
- Retrieves and processes results
- All without touching YAML files
Prerequisites
Section titled “Prerequisites”- Go 1.21 or later installed
- Basic understanding of Go programming
- Familiarity with Arena concepts (scenarios, providers)
Step 1: Set Up Your Go Project
Section titled “Step 1: Set Up Your Go Project”Create a new directory and initialize a Go module:
mkdir arena-lib-democd arena-lib-demogo mod init arena-lib-demoStep 2: Install Dependencies
Section titled “Step 2: Install Dependencies”Add PromptKit as a dependency:
go get github.com/AltairaLabs/PromptKit/pkg/configgo get github.com/AltairaLabs/PromptKit/runtime/promptgo get github.com/AltairaLabs/PromptKit/tools/arena/enginego get github.com/AltairaLabs/PromptKit/tools/arena/statestoreStep 3: Create Your First Programmatic Test
Section titled “Step 3: Create Your First Programmatic Test”Create a file named main.go:
package main
import ( "context" "fmt" "log"
"github.com/AltairaLabs/PromptKit/pkg/config" "github.com/AltairaLabs/PromptKit/runtime/prompt" "github.com/AltairaLabs/PromptKit/tools/arena/engine" "github.com/AltairaLabs/PromptKit/tools/arena/statestore")
func main() { // Step 1: Create a prompt configuration promptConfig := &prompt.Config{ Spec: prompt.Spec{ TaskType: "assistant", Version: "v1.0.0", Description: "A helpful AI assistant", SystemTemplate: `You are a helpful AI assistant.Be concise and accurate in your responses.`, }, }
// Step 2: Create the Arena configuration cfg := &config.Config{ LoadedProviders: map[string]*config.Provider{ "mock-provider": { ID: "mock-provider", Type: "mock", Model: "gpt-4", }, }, LoadedPromptConfigs: map[string]*config.PromptConfigData{ "assistant": { Config: promptConfig, TaskType: "assistant", }, }, LoadedScenarios: map[string]*config.Scenario{ "basic-test": { ID: "basic-test", TaskType: "assistant", Description: "Basic conversation test", Turns: []config.TurnDefinition{ {Role: "user", Content: "What is 2+2?"}, {Role: "user", Content: "What's the capital of France?"}, }, }, }, Defaults: config.Defaults{ Temperature: 0.7, MaxTokens: 500, Output: config.OutputConfig{ Dir: "out", }, }, }
fmt.Println("Building Arena engine...")
// Step 3: Build engine components providerReg, promptReg, mcpReg, executor, err := engine.BuildEngineComponents(cfg) if err != nil { log.Fatalf("Failed to build components: %v", err) }
// Step 4: Create the engine eng, err := engine.NewEngine(cfg, providerReg, promptReg, mcpReg, executor) if err != nil { log.Fatalf("Failed to create engine: %v", err) } defer eng.Close()
// Step 5: Generate execution plan plan, err := eng.GenerateRunPlan(nil, nil, nil) if err != nil { log.Fatalf("Failed to generate plan: %v", err) }
fmt.Printf("Generated %d test combinations\n", len(plan.Combinations))
// Step 6: Execute tests ctx := context.Background() runIDs, err := eng.ExecuteRuns(ctx, plan, 2) if err != nil { log.Fatalf("Failed to execute: %v", err) }
// Step 7: Retrieve results arenaStore := eng.GetStateStore().(*statestore.ArenaStateStore)
for i, runID := range runIDs { result, err := arenaStore.GetRunResult(ctx, runID) if err != nil { log.Printf("Failed to get result: %v", err) continue }
fmt.Printf("\nTest %d: %s\n", i+1, result.ScenarioID) fmt.Printf("Status: %s\n", getStatus(result.Error)) fmt.Printf("Duration: %s\n", result.Duration) fmt.Printf("Turns: %d\n", len(result.Messages)) }
fmt.Println("\n✅ All tests completed!")}
func getStatus(errMsg string) string { if errMsg == "" { return "✅ Success" } return "❌ Failed: " + errMsg}Step 4: Run Your Program
Section titled “Step 4: Run Your Program”go run main.goYou should see output like:
Building Arena engine...Generated 1 test combinationsTest 1: basic-testStatus: ✅ SuccessDuration: 2.5msTurns: 5
✅ All tests completed!Step 5: Add Real Provider Testing
Section titled “Step 5: Add Real Provider Testing”Let’s enhance the example to use a real provider (OpenAI):
// Replace the mock provider with OpenAILoadedProviders: map[string]*config.Provider{ "openai-gpt4": { ID: "openai-gpt4", Type: "openai", Model: "gpt-4", // API key is read from OPENAI_API_KEY env var by default },},Then run with your API key:
export OPENAI_API_KEY=sk-...go run main.goStep 6: Add Assertions
Section titled “Step 6: Add Assertions”Let’s add validation to our scenario:
LoadedScenarios: map[string]*config.Scenario{ "math-test": { ID: "math-test", TaskType: "assistant", Description: "Test math responses", Turns: []config.TurnDefinition{ { Role: "user", Content: "What is 2+2?", Assertions: []asrt.AssertionConfig{ { Type: "contains", Value: "4", }, }, }, }, },},Don’t forget to import the assertions package:
import ( // ... other imports asrt "github.com/AltairaLabs/PromptKit/tools/arena/assertions")What You’ve Learned
Section titled “What You’ve Learned”✅ How to create Arena configurations in Go code
✅ How to build engine components programmatically
✅ How to execute tests and retrieve results
✅ How to switch between mock and real providers
✅ How to add assertions to scenarios
Next Steps
Section titled “Next Steps”- How-To: Use Arena as a Go Library - Advanced integration patterns
- Reference: API Documentation - Complete API reference
- Example: Programmatic Arena - Full working example
Common Patterns
Section titled “Common Patterns”Dynamic Scenario Generation
Section titled “Dynamic Scenario Generation”// Generate scenarios from test datascenarios := make(map[string]*config.Scenario)for i, testCase := range testCases { scenarios[fmt.Sprintf("test-%d", i)] = &config.Scenario{ ID: fmt.Sprintf("test-%d", i), TaskType: "assistant", Turns: []config.TurnDefinition{ {Role: "user", Content: testCase.Input}, }, }}
cfg.LoadedScenarios = scenariosMultiple Provider Comparison
Section titled “Multiple Provider Comparison”// Test against multiple providersproviders := map[string]*config.Provider{ "openai": {ID: "openai", Type: "openai", Model: "gpt-4"}, "claude": {ID: "claude", Type: "anthropic", Model: "claude-3-5-sonnet-20241022"}, "gemini": {ID: "gemini", Type: "gemini", Model: "gemini-2.0-flash-exp"},}
cfg.LoadedProviders = providersCustom Result Processing
Section titled “Custom Result Processing”// Process results with custom logicfor _, runID := range runIDs { result, _ := arenaStore.GetRunResult(ctx, runID)
// Extract metrics metrics := map[string]interface{}{ "provider": result.ProviderID, "duration_ms": result.Duration.Milliseconds(), "cost": result.Cost.TotalCost, "tokens": result.Cost.InputTokens + result.Cost.OutputTokens, "success": result.Error == "", }
// Send to your analytics system sendToAnalytics(metrics)}