Tutorial 1: Your First Test
Learn the basics of PromptArena by creating and running your first LLM test.
What You’ll Learn
Section titled “What You’ll Learn”- Install PromptArena
- Create a basic configuration
- Write your first test scenario
- Configure an LLM provider
- Run tests and review results
Prerequisites
Section titled “Prerequisites”- An OpenAI API key (free tier works)
Step 1: Install PromptArena
Section titled “Step 1: Install PromptArena”Choose your preferred installation method:
Option 1: Homebrew (Recommended)
brew install promptkitOption 2: Go Install
go install github.com/AltairaLabs/PromptKit/tools/arena@latestVerify installation:
promptarena --versionYou should see the PromptArena version information.
Step 2: Create Your Test Project
Section titled “Step 2: Create Your Test Project”The Easy Way: Use the Template Generator
# Create a complete test project in secondspromptarena init my-first-test --quick --provider openai
# Navigate to your projectcd my-first-testThat’s it! The init command created everything you need:
- ✅ Arena configuration (
arena.yaml) - ✅ Prompt setup (
prompts/assistant.yaml) - ✅ Provider configuration (
providers/openai.yaml) - ✅ Sample test scenario (
scenarios/basic-test.yaml) - ✅ Environment setup (
.env)
The Manual Way: Create Files Step-by-Step
If you prefer to understand each component:
# Create a new directorymkdir my-first-testcd my-first-test
# Create the directory structuremkdir -p prompts providers scenariosStep 3: Create a Prompt Configuration
Section titled “Step 3: Create a Prompt Configuration”If you used promptarena init: You already have prompts/assistant.yaml. Feel free to edit it!
If creating manually: Create prompts/greeter.yaml:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: PromptConfigmetadata: name: greeter
spec: task_type: greeting
system_template: | You are a friendly assistant who greets users warmly. Keep responses brief and welcoming.What’s happening here?
apiVersionandkind: Standard PromptKit resource identifiersmetadata.name: Identifies this prompt configuration (we’ll reference it later)spec.task_type: Categorizes the prompt’s purposespec.system_template: System instructions sent to the LLM
Step 4: Configure a Provider
Section titled “Step 4: Configure a Provider”Create providers/openai.yaml:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Providermetadata: name: openai-gpt4o-mini
spec: type: openai model: gpt-4o-mini
defaults: temperature: 0.7 max_tokens: 150What’s happening here?
apiVersionandkind: Standard PromptKit resource identifiersmetadata.name: Friendly name for this provider configurationspec.type: The provider type (openai, anthropic, gemini)spec.model: Specific model to usespec.defaults: Model parameters like temperature and max_tokens- Authentication uses environment variable
OPENAI_API_KEYautomatically
Step 5: Set Your API Key
Section titled “Step 5: Set Your API Key”# Set the OpenAI API keyexport OPENAI_API_KEY="sk-your-api-key-here"
# Or add to your shell profile (~/.zshrc or ~/.bashrc)echo 'export OPENAI_API_KEY="sk-your-key"' >> ~/.zshrcsource ~/.zshrcStep 6: Write Your First Test Scenario
Section titled “Step 6: Write Your First Test Scenario”Create scenarios/greeting-test.yaml:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: greeting-test labels: category: basic
spec: task_type: greeting # Links to prompts/greeter.yaml
turns: - role: user content: "Hello!" assertions: - type: content_includes params: patterns: ["hello"] message: "Should include greeting"
- type: content_length params: max: 100 message: "Response should be brief"
- role: user content: "How are you?" assertions: - type: content_includes params: patterns: ["good"] message: "Should respond positively"What’s happening here?
apiVersionandkind: Standard PromptKit resource identifiersmetadata.name: Identifies this scenariospec.task_type: Links to the prompt configuration with matching task_typespec.turns: Array of conversation exchangesrole: user: Each user turn triggers an LLM responseassertions: Checks to validate the LLM’s response
Step 7: Create Main Configuration
Section titled “Step 7: Create Main Configuration”Create arena.yaml in your project root:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Arenametadata: name: my-first-test
spec: prompt_configs: - id: greeter file: prompts/greeter.yaml
providers: - file: providers/openai.yaml
scenarios: - file: scenarios/greeting-test.yamlThis tells Arena which configurations to load and how to connect them.
Step 8: Run Your First Test
Section titled “Step 8: Run Your First Test”promptarena runYou should see output like:
🚀 PromptArena Starting...
Loading configuration... ✓ Loaded 1 prompt config ✓ Loaded 1 provider ✓ Loaded 1 scenario
Running tests... ✓ Basic Greeting - Turn 1 [openai-gpt4o-mini] (1.2s) ✓ Basic Greeting - Turn 2 [openai-gpt4o-mini] (1.1s)
Results: Total: 2 turns Passed: 2 Failed: 0 Pass Rate: 100%
Reports generated: - out/results.jsonStep 9: Review Results
Section titled “Step 9: Review Results”View the JSON results:
cat out/results.jsonOr generate an HTML report:
promptarena run --format html
# Open in browseropen out/report-*.htmlUnderstanding Your First Test
Section titled “Understanding Your First Test”Let’s break down what just happened:
1. Configuration Loading
Section titled “1. Configuration Loading”Arena loaded your prompt, provider, and scenario files.
2. Prompt Assembly
Section titled “2. Prompt Assembly”For each turn, Arena:
- Took the system prompt from
greeter.yaml - Filled in the user message template
- Sent the complete prompt to OpenAI
3. Response Validation
Section titled “3. Response Validation”Arena checked each response against your assertions:
- Contains: Verified greeting words were present
- Max Length: Ensured response wasn’t too long
- Sentiment: Confirmed positive tone
4. Report Generation
Section titled “4. Report Generation”Arena saved results in multiple formats for analysis.
Experiment: Modify the Test
Section titled “Experiment: Modify the Test”Add More Assertions
Section titled “Add More Assertions”Edit scenarios/greeting-test.yaml to add more checks:
spec: turns: - role: user content: "Hello!" assertions: - type: content_includes params: patterns: ["hello"]
- type: content_length params: max: 100params: max_seconds: 3Run again:
promptarena runTest Edge Cases
Section titled “Test Edge Cases”Create a new scenario file scenarios/edge-cases.yaml:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: edge-cases labels: category: edge-case
spec: task_type: greeting
turns: - role: user content: "" assertions: - type: content_length params: min: 10Add it to arena.yaml:
spec: scenarios: - file: scenarios/greeting-test.yaml - file: scenarios/edge-cases.yamlAdjust Temperature
Section titled “Adjust Temperature”Edit providers/openai.yaml:
spec: defaults: temperature: 0.2 # More deterministic max_tokens: 150Run and compare:
promptarena runLower temperature = more consistent responses.
Common Issues
Section titled “Common Issues””command not found: promptarena"
Section titled “”command not found: promptarena"”# Ensure Go bin is in PATHexport PATH=$PATH:$(go env GOPATH)/bin"API key not found"
Section titled “"API key not found"”# Verify environment variable is setecho $OPENAI_API_KEY
# Should output: sk-..."No scenarios found”
Section titled “"No scenarios found””Check your arena.yaml paths match your directory structure:
# List your filesls prompts/ls providers/ls scenarios/“Assertion failed”
Section titled ““Assertion failed””This is expected! Assertions validate quality. If one fails:
- Check the error message in the output
- Review the actual response in
out/results.json - Adjust your assertions or prompt as needed
Next Steps
Section titled “Next Steps”Congratulations! You’ve run your first LLM test.
Continue learning:
- Tutorial 2: Multi-Provider Testing - Test across OpenAI, Claude, and Gemini
- Tutorial 3: Multi-Turn Conversations - Build complex dialog flows
- How-To: Write Scenarios - Advanced scenario patterns
Quick wins:
- Try different models:
gpt-4o,gpt-4o-mini - Add more test cases to your scenario
- Generate HTML reports:
promptarena run --format html
What’s Next?
Section titled “What’s Next?”In Tutorial 2, you’ll learn how to test the same scenario across multiple LLM providers (OpenAI, Claude, Gemini) and compare their responses.