Providers
Understanding LLM providers in PromptKit.
What is a Provider?
Section titled “What is a Provider?”A provider is an LLM service (OpenAI, Anthropic, Google) that generates text responses. PromptKit abstracts providers behind a common interface.
Supported Providers
Section titled “Supported Providers”OpenAI
Section titled “OpenAI”- Models: GPT-4o, GPT-4o-mini, GPT-4-turbo, GPT-3.5-turbo
- Features: Function calling, JSON mode, vision, streaming
- Pricing: Pay per token, varies by model
Anthropic (Claude)
Section titled “Anthropic (Claude)”- Models: Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 3 Opus
- Features: Function calling, vision, 200K context, streaming
- Pricing: Pay per token, varies by model
Google (Gemini)
Section titled “Google (Gemini)”- Models: Gemini 1.5 Pro, Gemini 1.5 Flash
- Features: Function calling, multimodal, 1M+ context, streaming
- Pricing: Pay per token, free tier available
Ollama (Local)
Section titled “Ollama (Local)”- Models: Llama 3.2, Mistral, LLaVA, DeepSeek, Phi, and more
- Features: Function calling, vision (LLaVA), streaming, OpenAI-compatible API
- Pricing: Free (local inference, no API costs)
vLLM (High-Performance Inference)
Section titled “vLLM (High-Performance Inference)”- Models: Any HuggingFace model, Llama 3.x, Mistral, Qwen, Phi, LLaVA, and more
- Features: Function calling, vision, streaming, guided decoding, beam search, GPU-accelerated high-throughput
- Pricing: Free (self-hosted, no API costs)
Platform Support
Section titled “Platform Support”PromptKit supports running models on cloud hyperscaler platforms in addition to direct API access:
AWS Bedrock
Section titled “AWS Bedrock”- Authentication: Uses AWS SDK credential chain (IRSA, IAM roles, env vars)
- Models: Claude models via Anthropic partnership
- Benefits: Enterprise security, VPC integration, no API key management
Google Cloud Vertex AI
Section titled “Google Cloud Vertex AI”- Authentication: Uses GCP Application Default Credentials (Workload Identity, service accounts)
- Models: Claude and Gemini models
- Benefits: GCP integration, enterprise compliance, unified billing
Azure AI Foundry
Section titled “Azure AI Foundry”- Authentication: Uses Azure AD tokens (Managed Identity, service principals)
- Models: OpenAI models via Azure partnership
- Benefits: Azure integration, enterprise security, compliance
Why Provider Abstraction?
Section titled “Why Provider Abstraction?”Problem: Each provider has different APIs
// OpenAI specificopenai.CreateChatCompletion(...)
// Claude specificanthropic.Messages(...)
// Gemini specificgenai.GenerateContent(...)Solution: Common interface
// Works with any providervar provider types.Providerresponse, err := provider.Complete(ctx, messages, config)Provider Interface
Section titled “Provider Interface”type Provider interface { Complete(ctx context.Context, messages []Message, config *ProviderConfig) (*ProviderResponse, error) CompleteStream(ctx context.Context, messages []Message, config *ProviderConfig) (StreamReader, error) GetProviderName() string Close() error}Using Providers
Section titled “Using Providers”Basic Usage
Section titled “Basic Usage”// Create providerprovider, err := openai.NewOpenAIProvider(apiKey, "gpt-4o-mini")if err != nil { log.Fatal(err)}defer provider.Close()
// Send requestmessages := []types.Message{ {Role: "user", Content: "Hello"},}response, err := provider.Complete(ctx, messages, nil)if err != nil { log.Fatal(err)}
fmt.Println(response.Content)With Configuration
Section titled “With Configuration”config := &types.ProviderConfig{ MaxTokens: 500, Temperature: 0.7, TopP: 0.9,}
response, err := provider.Complete(ctx, messages, config)Streaming
Section titled “Streaming”stream, err := provider.CompleteStream(ctx, messages, config)if err != nil { log.Fatal(err)}defer stream.Close()
for { chunk, err := stream.Recv() if err != nil { break } fmt.Print(chunk.Content)}Credential Configuration
Section titled “Credential Configuration”PromptKit supports flexible credential configuration with a resolution chain:
Resolution Order
Section titled “Resolution Order”Credentials are resolved in the following priority order:
api_key: Explicit API key in configurationcredential_file: Read API key from a file pathcredential_env: Read API key from specified environment variable- Default env vars: Fall back to provider-specific defaults (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.)
Configuration Examples
Section titled “Configuration Examples”# Explicit API key (not recommended for production)credential: api_key: "sk-..."
# Read from file (good for secrets management)credential: credential_file: /run/secrets/openai-key
# Read from custom env var (useful for multiple providers)credential: credential_env: OPENAI_PROD_API_KEYPer-Provider Credentials
Section titled “Per-Provider Credentials”Configure different credentials for the same provider type:
providers: - id: openai-prod type: openai model: gpt-4o credential: credential_env: OPENAI_PROD_KEY
- id: openai-dev type: openai model: gpt-4o-mini credential: credential_env: OPENAI_DEV_KEYPlatform Credentials
Section titled “Platform Credentials”For cloud platforms, credentials are handled automatically via SDK credential chains:
# AWS Bedrock - uses IRSA, IAM roles, or AWS_* env vars- id: claude-bedrock type: claude model: claude-3-5-sonnet-20241022 platform: type: bedrock region: us-west-2
# GCP Vertex AI - uses Workload Identity or GOOGLE_APPLICATION_CREDENTIALS- id: claude-vertex type: claude model: claude-3-5-sonnet-20241022 platform: type: vertex region: us-central1 project: my-gcp-project
# Azure AI - uses Managed Identity or AZURE_* env vars- id: gpt4-azure type: openai model: gpt-4o platform: type: azure endpoint: https://my-resource.openai.azure.comProvider Configuration
Section titled “Provider Configuration”Common Parameters
Section titled “Common Parameters”type ProviderConfig struct { MaxTokens int // Output limit (default: 4096) Temperature float64 // Randomness 0-2 (default: 1.0) TopP float64 // Nucleus sampling 0-1 (default: 1.0) Seed *int // Reproducibility (optional) StopSequences []string // Stop generation (optional)}Temperature
Section titled “Temperature”Controls randomness:
- 0.0: Deterministic, same output
- 0.5: Balanced
- 1.0: Creative (default)
- 2.0: Very random
// Factual tasksconfig := &types.ProviderConfig{Temperature: 0.2}
// Creative tasksconfig := &types.ProviderConfig{Temperature: 1.2}Max Tokens
Section titled “Max Tokens”Limits output length:
// Short responsesconfig := &types.ProviderConfig{MaxTokens: 100}
// Long responsesconfig := &types.ProviderConfig{MaxTokens: 4096}Multi-Provider Strategies
Section titled “Multi-Provider Strategies”Fallback
Section titled “Fallback”Try providers in order:
providers := []types.Provider{primary, secondary, tertiary}
var response *types.ProviderResponsevar err error
for _, provider := range providers { response, err = provider.Complete(ctx, messages, config) if err == nil { break // Success } log.Printf("Provider %s failed: %v", provider.GetProviderName(), err)}Load Balancing
Section titled “Load Balancing”Distribute across providers:
providers := []types.Provider{openai1, openai2, claude}current := 0
func GetNextProvider() types.Provider { provider := providers[current % len(providers)] current++ return provider}Cost-Based Routing
Section titled “Cost-Based Routing”Route by cost:
func SelectProvider(complexity string) types.Provider { switch complexity { case "simple": return gpt4oMini // Cheapest case "complex": return gpt4o // Best quality case "long_context": return gemini // Largest context default: return gpt4oMini }}Quality-Based Routing
Section titled “Quality-Based Routing”Route by quality needs:
func SelectByQuality(task string) types.Provider { if task == "code_generation" { return gpt4o // Best for code } else if task == "long_document" { return claude // Best for long docs } else { return gpt4oMini // Good enough }}Provider Comparison
Section titled “Provider Comparison”Performance
Section titled “Performance”| Provider | Latency | Throughput | Context |
|---|---|---|---|
| GPT-4o-mini | ~500ms | High | 128K |
| GPT-4o | ~1s | Medium | 128K |
| Claude Sonnet | ~1s | Medium | 200K |
| Claude Haiku | ~400ms | High | 200K |
| Gemini Flash | ~600ms | High | 1M+ |
| Gemini Pro | ~1.5s | Medium | 1M+ |
| Provider | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| GPT-4o-mini | $0.15 | $0.60 |
| GPT-4o | $2.50 | $10.00 |
| Claude Haiku | $0.25 | $1.25 |
| Claude Sonnet | $3.00 | $15.00 |
| Gemini Flash | $0.075 | $0.30 |
| Gemini Pro | $1.25 | $5.00 |
Use Cases
Section titled “Use Cases”GPT-4o-mini: General purpose, cost-effective
GPT-4o: Complex reasoning, code generation
Claude Haiku: Fast responses, high volume
Claude Sonnet: Long documents, analysis
Gemini Flash: Multimodal, cost-effective
Gemini Pro: Very long context, research
Provider Selection Guide
Section titled “Provider Selection Guide”Choose GPT-4o-mini when:
Section titled “Choose GPT-4o-mini when:”- Cost is primary concern
- Tasks are straightforward
- High volume needed
- Quick responses required
Choose GPT-4o when:
Section titled “Choose GPT-4o when:”- Quality is critical
- Complex reasoning needed
- Code generation
- Mathematical tasks
Choose Claude Sonnet when:
Section titled “Choose Claude Sonnet when:”- Long document analysis
- Detailed writing
- Research tasks
- Need 200K context
Choose Claude Haiku when:
Section titled “Choose Claude Haiku when:”- Speed critical
- Simple tasks
- High throughput
- Cost-effective
Choose Gemini Flash when:
Section titled “Choose Gemini Flash when:”- Multimodal input
- Cost-effective
- Good balance
- Video processing
Choose Gemini Pro when:
Section titled “Choose Gemini Pro when:”- Very long context (1M+ tokens)
- Research papers
- Large codebases
- Book analysis
Best Practices
Section titled “Best Practices”Resource Management
Section titled “Resource Management”✅ Close providers
provider, _ := openai.NewOpenAIProvider(...)defer provider.Close() // Essential✅ Reuse providers
// Good: One provider, many requestsprovider := createProvider()for _, prompt := range prompts { provider.Complete(ctx, prompt, config)}
// Bad: New provider per requestfor _, prompt := range prompts { provider := createProvider() provider.Complete(ctx, prompt, config) provider.Close()}Error Handling
Section titled “Error Handling”✅ Handle provider errors
response, err := provider.Complete(ctx, messages, config)if err != nil { if errors.Is(err, ErrRateLimited) { // Wait and retry } else if errors.Is(err, ErrInvalidKey) { // Check credentials } else { // Fallback provider }}Timeouts
Section titled “Timeouts”✅ Set context timeouts
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)defer cancel()
response, err := provider.Complete(ctx, messages, config)Testing with Providers
Section titled “Testing with Providers”Mock Provider
Section titled “Mock Provider”import "github.com/AltairaLabs/PromptKit/runtime/providers/mock"
func TestWithMock(t *testing.T) { mockProvider := mock.NewMockProvider() mockProvider.SetResponse("Test response")
response, err := mockProvider.Complete(ctx, messages, nil) assert.NoError(t, err) assert.Equal(t, "Test response", response.Content)}Benefits
Section titled “Benefits”- No API calls
- No costs
- Fast tests
- Predictable responses
- Offline testing
Monitoring Providers
Section titled “Monitoring Providers”Track Usage
Section titled “Track Usage”type ProviderMetrics struct { RequestCount int ErrorCount int TotalCost float64 TotalTokens int AvgLatency time.Duration}
func TrackRequest(provider string, response *ProviderResponse, err error) { metrics := GetMetrics(provider) metrics.RequestCount++
if err != nil { metrics.ErrorCount++ } else { metrics.TotalCost += response.Cost metrics.TotalTokens += response.Usage.TotalTokens }}Monitor Costs
Section titled “Monitor Costs”costTracker := middleware.NewCostTracker()
// Use in pipelinepipe := pipeline.NewPipeline( middleware.ProviderMiddleware(provider, nil, costTracker, config),)
// Check costsfmt.Printf("Total cost: $%.4f\n", costTracker.TotalCost())Summary
Section titled “Summary”Providers are:
✅ Abstracted - Common interface for all LLMs
✅ Flexible - Easy to switch or combine
✅ Configurable - Fine-tune behavior
✅ Testable - Mock for unit tests
✅ Monitorable - Track usage and costs
Related Documentation
Section titled “Related Documentation”- Provider System Explanation - Architecture details
- Provider Reference - API documentation
- Cloud Provider Examples - Bedrock, Vertex, Azure examples
- Multi-Provider Fallback - Implementation guide
- Cost Monitoring - Track expenses