Skip to content

Configure LLM Providers

Learn how to configure and manage LLM providers for testing.

Providers define how PromptArena connects to different LLM services (OpenAI, Anthropic, Google, etc.). Each provider configuration specifies authentication, model selection, and default parameters.

The easiest way to set up providers is using the project generator:

Terminal window
# Create project with OpenAI
promptarena init my-test --quick --provider openai
# Or choose during interactive setup
promptarena init my-test
# Select provider when prompted: openai, anthropic, google, or mock

This automatically creates a working provider configuration with:

  • Correct API version and schema
  • Recommended model defaults
  • Environment variable setup (.env file)
  • Ready-to-use configuration

For custom setups or advanced configurations, create provider files in providers/ directory:

# providers/openai.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Provider
metadata:
name: openai-gpt4o-mini
labels:
provider: openai
spec:
type: openai
model: gpt-4o-mini
capabilities:
- text
- streaming
- vision
- tools
- json
defaults:
temperature: 0.6
max_tokens: 2000
top_p: 1.0

Authentication uses the OPENAI_API_KEY environment variable automatically.

The capabilities field declares what features a provider supports. Scenarios can then use required_capabilities to only run against providers with matching capabilities.

Available Capabilities:

CapabilityDescription
textBasic text completion
streamingStreaming responses
visionImage understanding
toolsFunction/tool calling
jsonJSON mode output
audioAudio input understanding
videoVideo input understanding
documentsPDF/document upload support
duplexReal-time bidirectional audio

Example - Vision-capable provider:

spec:
type: openai
model: gpt-4o
capabilities:
- text
- streaming
- vision
- tools
- json

Example - Audio-only provider:

spec:
type: openai
model: gpt-4o-audio-preview
capabilities:
- audio
- duplex

Example - Local model with limited capabilities:

spec:
type: ollama
model: llama3.2:3b
capabilities:
- text
- streaming
- tools
- json
# Note: llama3.2:3b does NOT support vision (only 11B/90B models do)

When a scenario specifies required_capabilities, only providers with ALL listed capabilities will run that scenario. See Write Scenarios for details.

# providers/openai-gpt4.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Provider
metadata:
name: openai-gpt4o
labels:
provider: openai
spec:
type: openai
model: gpt-4o
defaults:
temperature: 0.7
max_tokens: 4000

Available Models: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo

# providers/claude.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Provider
metadata:
name: claude-sonnet
labels:
provider: anthropic
spec:
type: anthropic
model: claude-3-5-sonnet-20241022
defaults:
temperature: 0.6
max_tokens: 4000

Authentication uses the ANTHROPIC_API_KEY environment variable automatically.

Available Models: claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022, claude-3-opus-20240229

# providers/gemini.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Provider
metadata:
name: gemini-flash
labels:
provider: google
spec:
type: gemini
model: gemini-1.5-flash
defaults:
temperature: 0.7
max_tokens: 2000

Authentication uses the GOOGLE_API_KEY environment variable automatically.

Available Models: gemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash-exp

# providers/azure-openai.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Provider
metadata:
name: azure-openai-gpt4o
labels:
provider: azure-openai
spec:
type: azure-openai
model: gpt-4o
base_url: https://your-resource.openai.azure.com
defaults:
temperature: 0.6
max_tokens: 2000

Authentication uses the AZURE_OPENAI_API_KEY environment variable automatically.

# providers/ollama.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Provider
metadata:
name: ollama-llama
labels:
provider: ollama
spec:
type: ollama
model: llama3.2:1b
base_url: http://localhost:11434
additional_config:
keep_alive: "5m" # Keep model loaded for 5 minutes
defaults:
temperature: 0.7
max_tokens: 2048

No API key required - Ollama runs locally. Start Ollama with:

Terminal window
# Install Ollama
brew install ollama # macOS
# or visit https://ollama.ai for other platforms
# Start Ollama server
ollama serve
# Pull a model
ollama pull llama3.2:1b

Or use Docker:

Terminal window
docker run -d -p 11434:11434 -v ollama:/root/.ollama ollama/ollama
docker exec -it <container> ollama pull llama3.2:1b

Available Models: Any model from ollama list - llama3.2:1b, llama3.2:3b, mistral, llava, deepseek-r1:8b, etc.

# providers/vllm.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Provider
metadata:
name: vllm-llama
labels:
provider: vllm
spec:
type: vllm
model: meta-llama/Llama-3.2-3B-Instruct
base_url: http://localhost:8000
additional_config:
use_beam_search: false
best_of: 1
# Guided decoding for structured output
# guided_json: '{"type": "object", "properties": {...}}'
defaults:
temperature: 0.7
max_tokens: 2048

No API key required - vLLM runs as a self-hosted service. Start vLLM with Docker:

Terminal window
# GPU-accelerated (recommended)
docker run --rm --gpus all \
-p 8000:8000 \
vllm/vllm-openai:latest \
--model meta-llama/Llama-3.2-3B-Instruct \
--dtype half \
--max-model-len 4096
# CPU-only (for testing, slow)
docker run --rm \
-p 8000:8000 \
vllm/vllm-openai:latest \
--model meta-llama/Llama-3.2-1B-Instruct \
--max-model-len 2048

Or use Docker Compose:

services:
vllm:
image: vllm/vllm-openai:latest
ports:
- "8000:8000"
volumes:
- vllm_cache:/root/.cache/huggingface
command:
- --model
- meta-llama/Llama-3.2-3B-Instruct
- --dtype
- half
- --max-model-len
- "4096"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
volumes:
vllm_cache:

Available Models: Any HuggingFace model supported by vLLM - Llama 3.x, Mistral, Qwen, Phi, LLaVA for vision, etc. See vLLM docs.

Advanced Features:

# Guided JSON output
spec:
additional_config:
guided_json: |
{
"type": "object",
"properties": {
"sentiment": {"type": "string", "enum": ["positive", "negative", "neutral"]},
"confidence": {"type": "number", "minimum": 0, "maximum": 1}
},
"required": ["sentiment", "confidence"]
}
# Regex-constrained output
spec:
additional_config:
guided_regex: "^[0-9]{3}-[0-9]{3}-[0-9]{4}$" # Phone number format
# Choice selection
spec:
additional_config:
guided_choice: ["yes", "no", "maybe"]

Reference providers in your arena.yaml:

apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Arena
metadata:
name: multi-provider-arena
spec:
prompt_configs:
- id: support
file: prompts/support.yaml
providers:
- file: providers/openai.yaml
- file: providers/claude.yaml
- file: providers/gemini.yaml
scenarios:
- file: scenarios/customer-support.yaml

Set API keys as environment variables:

Terminal window
# Add to ~/.zshrc or ~/.bashrc
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="..."
# Reload shell configuration
source ~/.zshrc

Create a .env file (never commit this):

Terminal window
# .env
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=...

Load environment variables before running:

Terminal window
# Load .env and run tests
export $(cat .env | xargs) && promptarena run

For GitHub Actions, GitLab CI, or other platforms:

# .github/workflows/test.yml
env:
OPENAI_API_KEY: $
ANTHROPIC_API_KEY: $

Test across different model sizes/versions:

# providers/openai-gpt4.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Provider
metadata:
name: openai-gpt4
labels:
provider: openai
tier: premium
spec:
type: openai
model: gpt-4o
defaults:
temperature: 0.6
---
# providers/openai-mini.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Provider
metadata:
name: openai-mini
labels:
provider: openai
tier: cost-effective
spec:
type: openai
model: gpt-4o-mini
defaults:
temperature: 0.6
# providers/openai-creative.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Provider
metadata:
name: openai-creative
labels:
mode: creative
spec:
type: openai
model: gpt-4o
defaults:
temperature: 0.9 # More creative/random
---
# providers/openai-precise.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Provider
metadata:
name: openai-precise
labels:
mode: deterministic
spec:
type: openai
model: gpt-4o
defaults:
temperature: 0.1 # More deterministic
Terminal window
# Test with only OpenAI
promptarena run --provider openai-gpt4
# Test with multiple providers
promptarena run --provider openai-gpt4,claude-sonnet
# Test all configured providers (default)
promptarena run

Use labels to specify provider constraints:

# scenarios/openai-only.yaml
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: openai-specific-test
labels:
provider-specific: openai
spec:
task_type: support
turns:
- role: user
content: "Test message"

Override provider parameters at runtime:

Terminal window
# Override temperature for all providers
promptarena run --temperature 0.8
# Override max tokens
promptarena run --max-tokens 1000
# Combined overrides
promptarena run --temperature 0.9 --max-tokens 4000

Verify provider configuration:

Terminal window
# Inspect loaded providers
promptarena config-inspect
# Should show:
# Providers:
# ✓ openai-gpt4 (providers/openai.yaml)
# ✓ claude-sonnet (providers/claude.yaml)
Terminal window
# Verify API key is set
echo $OPENAI_API_KEY
# Should display: sk-...
# Test with verbose logging
promptarena run --provider openai-gpt4 --verbose
Terminal window
# Check provider configuration
promptarena config-inspect --verbose
# Verify file path in arena.yaml matches actual file location

Configure concurrency to avoid rate limits:

Terminal window
# Reduce concurrent requests
promptarena run --concurrency 2
# For large test suites
promptarena run --concurrency 1 # Sequential execution

See working provider configurations in:

  • examples/customer-support/providers/
  • examples/mcp-chatbot/providers/
  • examples/ollama-local/providers/ - Local Ollama setup with Docker