Configure LLM Providers
Learn how to configure and manage LLM providers for testing.
Overview
Section titled “Overview”Providers define how PromptArena connects to different LLM services (OpenAI, Anthropic, Google, etc.). Each provider configuration specifies authentication, model selection, and default parameters.
Quick Start with Templates
Section titled “Quick Start with Templates”The easiest way to set up providers is using the project generator:
# Create project with OpenAIpromptarena init my-test --quick --provider openai
# Or choose during interactive setuppromptarena init my-test# Select provider when prompted: openai, anthropic, google, or mockThis automatically creates a working provider configuration with:
- Correct API version and schema
- Recommended model defaults
- Environment variable setup (.env file)
- Ready-to-use configuration
Manual Provider Configuration
Section titled “Manual Provider Configuration”For custom setups or advanced configurations, create provider files in providers/ directory:
# providers/openai.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Providermetadata: name: openai-gpt4o-mini labels: provider: openai
spec: type: openai model: gpt-4o-mini
capabilities: - text - streaming - vision - tools - json
defaults: temperature: 0.6 max_tokens: 2000 top_p: 1.0Authentication uses the OPENAI_API_KEY environment variable automatically.
Provider Capabilities
Section titled “Provider Capabilities”The capabilities field declares what features a provider supports. Scenarios can then use required_capabilities to only run against providers with matching capabilities.
Available Capabilities:
| Capability | Description |
|---|---|
text | Basic text completion |
streaming | Streaming responses |
vision | Image understanding |
tools | Function/tool calling |
json | JSON mode output |
audio | Audio input understanding |
video | Video input understanding |
documents | PDF/document upload support |
duplex | Real-time bidirectional audio |
Example - Vision-capable provider:
spec: type: openai model: gpt-4o capabilities: - text - streaming - vision - tools - jsonExample - Audio-only provider:
spec: type: openai model: gpt-4o-audio-preview capabilities: - audio - duplexExample - Local model with limited capabilities:
spec: type: ollama model: llama3.2:3b capabilities: - text - streaming - tools - json # Note: llama3.2:3b does NOT support vision (only 11B/90B models do)When a scenario specifies required_capabilities, only providers with ALL listed capabilities will run that scenario. See Write Scenarios for details.
Supported Providers
Section titled “Supported Providers”OpenAI
Section titled “OpenAI”# providers/openai-gpt4.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Providermetadata: name: openai-gpt4o labels: provider: openai
spec: type: openai model: gpt-4o
defaults: temperature: 0.7 max_tokens: 4000Available Models: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo
Anthropic Claude
Section titled “Anthropic Claude”# providers/claude.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Providermetadata: name: claude-sonnet labels: provider: anthropic
spec: type: anthropic model: claude-3-5-sonnet-20241022
defaults: temperature: 0.6 max_tokens: 4000Authentication uses the ANTHROPIC_API_KEY environment variable automatically.
Available Models: claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022, claude-3-opus-20240229
Google Gemini
Section titled “Google Gemini”# providers/gemini.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Providermetadata: name: gemini-flash labels: provider: google
spec: type: gemini model: gemini-1.5-flash
defaults: temperature: 0.7 max_tokens: 2000Authentication uses the GOOGLE_API_KEY environment variable automatically.
Available Models: gemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash-exp
Azure OpenAI
Section titled “Azure OpenAI”Azure OpenAI uses type: openai with Azure platform authentication. The provider auto-derives the deployment URL from platform.endpoint and model:
# providers/azure-openai.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Providermetadata: name: azure-openai-gpt4o labels: provider: azure-openai
spec: type: openai model: gpt-4o
platform: type: azure endpoint: https://your-resource.openai.azure.com additional_config: api_version: "2024-12-01-preview" # optional, this is the default
defaults: temperature: 0.6 max_tokens: 2000Authentication uses the Azure Default Credential chain automatically (Managed Identity in AKS/Azure VMs, Azure CLI, or environment variables AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET).
For API key auth instead of Azure AD, use the credential field:
spec: type: openai model: gpt-4o platform: type: azure endpoint: https://your-resource.openai.azure.com credential: credential_env: AZURE_OPENAI_API_KEYOllama (Local)
Section titled “Ollama (Local)”# providers/ollama.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Providermetadata: name: ollama-llama labels: provider: ollama
spec: type: ollama model: llama3.2:1b base_url: http://localhost:11434
additional_config: keep_alive: "5m" # Keep model loaded for 5 minutes
defaults: temperature: 0.7 max_tokens: 2048No API key required - Ollama runs locally. Start Ollama with:
# Install Ollamabrew install ollama # macOS# or visit https://ollama.ai for other platforms
# Start Ollama serverollama serve
# Pull a modelollama pull llama3.2:1bOr use Docker:
docker run -d -p 11434:11434 -v ollama:/root/.ollama ollama/ollamadocker exec -it <container> ollama pull llama3.2:1bAvailable Models: Any model from ollama list - llama3.2:1b, llama3.2:3b, mistral, llava, deepseek-r1:8b, etc.
vLLM (High-Performance)
Section titled “vLLM (High-Performance)”# providers/vllm.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Providermetadata: name: vllm-llama labels: provider: vllm
spec: type: vllm model: meta-llama/Llama-3.2-3B-Instruct base_url: http://localhost:8000
additional_config: use_beam_search: false best_of: 1 # Guided decoding for structured output # guided_json: '{"type": "object", "properties": {...}}'
defaults: temperature: 0.7 max_tokens: 2048No API key required - vLLM runs as a self-hosted service. Start vLLM with Docker:
# GPU-accelerated (recommended)docker run --rm --gpus all \ -p 8000:8000 \ vllm/vllm-openai:latest \ --model meta-llama/Llama-3.2-3B-Instruct \ --dtype half \ --max-model-len 4096
# CPU-only (for testing, slow)docker run --rm \ -p 8000:8000 \ vllm/vllm-openai:latest \ --model meta-llama/Llama-3.2-1B-Instruct \ --max-model-len 2048Or use Docker Compose:
services: vllm: image: vllm/vllm-openai:latest ports: - "8000:8000" volumes: - vllm_cache:/root/.cache/huggingface command: - --model - meta-llama/Llama-3.2-3B-Instruct - --dtype - half - --max-model-len - "4096" deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu]
volumes: vllm_cache:Available Models: Any HuggingFace model supported by vLLM - Llama 3.x, Mistral, Qwen, Phi, LLaVA for vision, etc. See vLLM docs.
Advanced Features:
# Guided JSON outputspec: additional_config: guided_json: | { "type": "object", "properties": { "sentiment": {"type": "string", "enum": ["positive", "negative", "neutral"]}, "confidence": {"type": "number", "minimum": 0, "maximum": 1} }, "required": ["sentiment", "confidence"] }
# Regex-constrained outputspec: additional_config: guided_regex: "^[0-9]{3}-[0-9]{3}-[0-9]{4}$" # Phone number format
# Choice selectionspec: additional_config: guided_choice: ["yes", "no", "maybe"]OpenAI-Compatible Gateways
Section titled “OpenAI-Compatible Gateways”Many third-party services expose an OpenAI-compatible API: OpenRouter, Groq, Together AI, Fireworks AI, LiteLLM, and self-hosted proxies. You can point the built-in openai provider at any of them with base_url and (optionally) headers.
The headers field injects custom HTTP headers into every request this provider sends. It’s a top-level field on the provider spec. Header values are plain strings — use the credential field for secrets. If a custom header collides with a built-in header the provider sets itself (e.g. Authorization, Content-Type), the request fails fast with an error.
OpenRouter
Section titled “OpenRouter”OpenRouter routes requests across hundreds of models through a single OpenAI-compatible endpoint. Its documentation recommends setting HTTP-Referer and X-Title for app attribution and leaderboard ranking:
# providers/openrouter.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Providermetadata: name: openrouter-claude labels: gateway: openrouter
spec: type: openai model: anthropic/claude-sonnet-4-20250514 base_url: https://openrouter.ai/api/v1
headers: HTTP-Referer: https://myapp.com X-Title: My App
credential: credential_env: OPENROUTER_API_KEY
defaults: temperature: 0.6 max_tokens: 2000Then export your key:
export OPENROUTER_API_KEY="sk-or-v1-..."OpenRouter accepts any model identifier from its catalog: openai/gpt-4o-mini, anthropic/claude-sonnet-4-20250514, meta-llama/llama-3.3-70b-instruct, etc.
Groq offers ultra-fast inference for open-source models:
# providers/groq.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Providermetadata: name: groq-llamaspec: type: openai model: llama-3.3-70b-versatile base_url: https://api.groq.com/openai/v1 credential: credential_env: GROQ_API_KEYNo custom headers needed — Groq’s OpenAI-compatible endpoint works with base_url alone.
Together AI
Section titled “Together AI”# providers/together.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Providermetadata: name: together-llamaspec: type: openai model: meta-llama/Llama-3.3-70B-Instruct-Turbo base_url: https://api.together.xyz/v1 credential: credential_env: TOGETHER_API_KEYSelf-Hosted LiteLLM Proxy
Section titled “Self-Hosted LiteLLM Proxy”If you run LiteLLM as an on-prem gateway, point PromptArena at it and use a custom auth header if your proxy is secured internally:
# providers/litellm.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Providermetadata: name: litellm-internalspec: type: openai model: gpt-4o-mini # whatever LiteLLM routes this name to base_url: http://litellm.internal:4000 headers: X-Internal-Auth: shared-gateway-tokenHow It Works
Section titled “How It Works”Any provider can declare headers, not just OpenAI-compatible gateways. The headers are applied after the provider sets its own built-in headers (auth, content type, etc.), so collision detection kicks in before any bytes leave the client. If you need to override a built-in header — don’t. Use the credential field for auth instead.
Streaming and Reliability
Section titled “Streaming and Reliability”Provider configs support streaming retry, concurrency limits, and connection pool tuning. These are all optional and default to safe values.
Streaming Retry
Section titled “Streaming Retry”Recover from transient HTTP/2 stream resets without failing the test:
spec: type: openai model: gpt-5-pro stream_retry: enabled: true max_attempts: 2 # initial + 1 retry retry_window: pre_first_chunk # safe default budget: rate_per_sec: 5 burst: 10Set retry_window: always to also retry mid-stream failures (the response is discarded and re-requested from scratch, costing additional tokens):
stream_retry: enabled: true retry_window: always # costs tokens on mid-stream retryConcurrency and Timeouts
Section titled “Concurrency and Timeouts”spec: type: openai model: gpt-4o request_timeout: "60s" # non-streaming call timeout stream_idle_timeout: "30s" # abort if stream goes silent stream_max_concurrent: 50 # max parallel streams (0 = unlimited)Connection Pool
Section titled “Connection Pool”Raise max_conns_per_host when running high-concurrency test suites against a single provider:
spec: type: openai model: gpt-4o http_transport: max_conns_per_host: 250 # default: 100 max_idle_conns_per_host: 250 idle_conn_timeout: "90s"The effective concurrent-stream ceiling is max_conns_per_host multiplied by the upstream’s HTTP/2 max concurrent streams setting (typically 100-256). The promptkit_http_conns_in_use Prometheus gauge tracks pool pressure.
Arena Configuration
Section titled “Arena Configuration”Reference providers in your arena.yaml:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Arenametadata: name: multi-provider-arena
spec: prompt_configs: - id: support file: prompts/support.yaml
providers: - file: providers/openai.yaml - file: providers/claude.yaml - file: providers/gemini.yaml
scenarios: - file: scenarios/customer-support.yamlAuthentication Setup
Section titled “Authentication Setup”Environment Variables
Section titled “Environment Variables”Set API keys as environment variables:
# Add to ~/.zshrc or ~/.bashrcexport OPENAI_API_KEY="sk-..."export ANTHROPIC_API_KEY="sk-ant-..."export GOOGLE_API_KEY="..."
# Reload shell configurationsource ~/.zshrc.env File (Local Development)
Section titled “.env File (Local Development)”Create a .env file (never commit this):
# .envOPENAI_API_KEY=sk-...ANTHROPIC_API_KEY=sk-ant-...GOOGLE_API_KEY=...Load environment variables before running:
# Load .env and run testsexport $(cat .env | xargs) && promptarena runCI/CD Secrets
Section titled “CI/CD Secrets”For GitHub Actions, GitLab CI, or other platforms:
# .github/workflows/test.ymlenv: OPENAI_API_KEY: $ ANTHROPIC_API_KEY: $Common Configurations
Section titled “Common Configurations”Multiple Model Variants
Section titled “Multiple Model Variants”Test across different model sizes/versions:
# providers/openai-gpt4.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Providermetadata: name: openai-gpt4 labels: provider: openai tier: premium
spec: type: openai model: gpt-4o defaults: temperature: 0.6
---# providers/openai-mini.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Providermetadata: name: openai-mini labels: provider: openai tier: cost-effective
spec: type: openai model: gpt-4o-mini defaults: temperature: 0.6Temperature Variations
Section titled “Temperature Variations”# providers/openai-creative.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Providermetadata: name: openai-creative labels: mode: creative
spec: type: openai model: gpt-4o defaults: temperature: 0.9 # More creative/random
---# providers/openai-precise.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Providermetadata: name: openai-precise labels: mode: deterministic
spec: type: openai model: gpt-4o defaults: temperature: 0.1 # More deterministicProvider Selection
Section titled “Provider Selection”Run Specific Providers
Section titled “Run Specific Providers”# Test with only OpenAIpromptarena run --provider openai-gpt4
# Test with multiple providerspromptarena run --provider openai-gpt4,claude-sonnet
# Test all configured providers (default)promptarena runScenario-specific Providers
Section titled “Scenario-specific Providers”Use labels to specify provider constraints:
# scenarios/openai-only.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: openai-specific-test labels: provider-specific: openai
spec: task_type: support
turns: - role: user content: "Test message"Parameter Overrides
Section titled “Parameter Overrides”Override provider parameters at runtime:
# Override temperature for all providerspromptarena run --temperature 0.8
# Override max tokenspromptarena run --max-tokens 1000
# Combined overridespromptarena run --temperature 0.9 --max-tokens 4000Validation
Section titled “Validation”Verify provider configuration:
# Inspect loaded providerspromptarena config-inspect
# Should show:# Providers:# ✓ openai-gpt4 (providers/openai.yaml)# ✓ claude-sonnet (providers/claude.yaml)Troubleshooting
Section titled “Troubleshooting”Authentication Errors
Section titled “Authentication Errors”# Verify API key is setecho $OPENAI_API_KEY# Should display: sk-...
# Test with verbose loggingpromptarena run --provider openai-gpt4 --verboseProvider Not Found
Section titled “Provider Not Found”# Check provider configurationpromptarena config-inspect --verbose
# Verify file path in arena.yaml matches actual file locationRate Limiting
Section titled “Rate Limiting”Configure concurrency to avoid rate limits:
# Reduce concurrent requestspromptarena run --concurrency 2
# For large test suitespromptarena run --concurrency 1 # Sequential executionNext Steps
Section titled “Next Steps”- Use Mock Providers - Test without API calls
- Validate Outputs - Add assertions
- Integrate CI/CD - Automate testing
- Config Reference - Complete configuration options
Examples
Section titled “Examples”See working provider configurations in:
examples/customer-support/providers/examples/mcp-chatbot/providers/examples/ollama-local/providers/- Local Ollama setup with Docker