Skip to content

Providers

Understanding LLM providers in PromptKit.

A provider is an LLM service (OpenAI, Anthropic, Google) that generates text responses. PromptKit abstracts providers behind a common interface.

  • Models: GPT-4o, GPT-4o-mini, GPT-4-turbo, GPT-3.5-turbo
  • Features: Function calling, JSON mode, vision, streaming
  • Pricing: Pay per token, varies by model
  • Models: Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 3 Opus
  • Features: Function calling, vision, 200K context, streaming
  • Pricing: Pay per token, varies by model
  • Models: Gemini 1.5 Pro, Gemini 1.5 Flash
  • Features: Function calling, multimodal, 1M+ context, streaming
  • Pricing: Pay per token, free tier available
  • Models: Llama 3.2, Mistral, LLaVA, DeepSeek, Phi, and more
  • Features: Function calling, vision (LLaVA), streaming, OpenAI-compatible API
  • Pricing: Free (local inference, no API costs)
  • Models: Any HuggingFace model, Llama 3.x, Mistral, Qwen, Phi, LLaVA, and more
  • Features: Function calling, vision, streaming, guided decoding, beam search, GPU-accelerated high-throughput
  • Pricing: Free (self-hosted, no API costs)

PromptKit supports running models on cloud hyperscaler platforms in addition to direct API access:

  • Authentication: Uses AWS SDK credential chain (IRSA, IAM roles, env vars)
  • Models: Claude models via Anthropic partnership
  • Benefits: Enterprise security, VPC integration, no API key management
  • Authentication: Uses GCP Application Default Credentials (Workload Identity, service accounts)
  • Models: Claude and Gemini models
  • Benefits: GCP integration, enterprise compliance, unified billing
  • Authentication: Uses Azure AD tokens (Managed Identity, service principals)
  • Models: OpenAI models via Azure partnership
  • Benefits: Azure integration, enterprise security, compliance

Problem: Each provider has different APIs

// OpenAI specific
openai.CreateChatCompletion(...)
// Claude specific
anthropic.Messages(...)
// Gemini specific
genai.GenerateContent(...)

Solution: Common interface

// Works with any provider
var provider types.Provider
response, err := provider.Complete(ctx, messages, config)
type Provider interface {
Complete(ctx context.Context, messages []Message, config *ProviderConfig) (*ProviderResponse, error)
CompleteStream(ctx context.Context, messages []Message, config *ProviderConfig) (StreamReader, error)
GetProviderName() string
Close() error
}
// Create provider
provider, err := openai.NewOpenAIProvider(apiKey, "gpt-4o-mini")
if err != nil {
log.Fatal(err)
}
defer provider.Close()
// Send request
messages := []types.Message{
{Role: "user", Content: "Hello"},
}
response, err := provider.Complete(ctx, messages, nil)
if err != nil {
log.Fatal(err)
}
fmt.Println(response.Content)
config := &types.ProviderConfig{
MaxTokens: 500,
Temperature: 0.7,
TopP: 0.9,
}
response, err := provider.Complete(ctx, messages, config)
stream, err := provider.CompleteStream(ctx, messages, config)
if err != nil {
log.Fatal(err)
}
defer stream.Close()
for {
chunk, err := stream.Recv()
if err != nil {
break
}
fmt.Print(chunk.Content)
}

PromptKit supports flexible credential configuration with a resolution chain:

Credentials are resolved in the following priority order:

  1. api_key: Explicit API key in configuration
  2. credential_file: Read API key from a file path
  3. credential_env: Read API key from specified environment variable
  4. Default env vars: Fall back to provider-specific defaults (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.)
# Explicit API key (not recommended for production)
credential:
api_key: "sk-..."
# Read from file (good for secrets management)
credential:
credential_file: /run/secrets/openai-key
# Read from custom env var (useful for multiple providers)
credential:
credential_env: OPENAI_PROD_API_KEY

Configure different credentials for the same provider type:

providers:
- id: openai-prod
type: openai
model: gpt-4o
credential:
credential_env: OPENAI_PROD_KEY
- id: openai-dev
type: openai
model: gpt-4o-mini
credential:
credential_env: OPENAI_DEV_KEY

For cloud platforms, credentials are handled automatically via SDK credential chains:

# AWS Bedrock - uses IRSA, IAM roles, or AWS_* env vars
- id: claude-bedrock
type: claude
model: claude-3-5-sonnet-20241022
platform:
type: bedrock
region: us-west-2
# GCP Vertex AI - uses Workload Identity or GOOGLE_APPLICATION_CREDENTIALS
- id: claude-vertex
type: claude
model: claude-3-5-sonnet-20241022
platform:
type: vertex
region: us-central1
project: my-gcp-project
# Azure AI - uses Managed Identity or AZURE_* env vars
- id: gpt4-azure
type: openai
model: gpt-4o
platform:
type: azure
endpoint: https://my-resource.openai.azure.com
type ProviderConfig struct {
MaxTokens int // Output limit (default: 4096)
Temperature float64 // Randomness 0-2 (default: 1.0)
TopP float64 // Nucleus sampling 0-1 (default: 1.0)
Seed *int // Reproducibility (optional)
StopSequences []string // Stop generation (optional)
}

Controls randomness:

  • 0.0: Deterministic, same output
  • 0.5: Balanced
  • 1.0: Creative (default)
  • 2.0: Very random
// Factual tasks
config := &types.ProviderConfig{Temperature: 0.2}
// Creative tasks
config := &types.ProviderConfig{Temperature: 1.2}

Limits output length:

// Short responses
config := &types.ProviderConfig{MaxTokens: 100}
// Long responses
config := &types.ProviderConfig{MaxTokens: 4096}

Try providers in order:

providers := []types.Provider{primary, secondary, tertiary}
var response *types.ProviderResponse
var err error
for _, provider := range providers {
response, err = provider.Complete(ctx, messages, config)
if err == nil {
break // Success
}
log.Printf("Provider %s failed: %v", provider.GetProviderName(), err)
}

Distribute across providers:

providers := []types.Provider{openai1, openai2, claude}
current := 0
func GetNextProvider() types.Provider {
provider := providers[current % len(providers)]
current++
return provider
}

Route by cost:

func SelectProvider(complexity string) types.Provider {
switch complexity {
case "simple":
return gpt4oMini // Cheapest
case "complex":
return gpt4o // Best quality
case "long_context":
return gemini // Largest context
default:
return gpt4oMini
}
}

Route by quality needs:

func SelectByQuality(task string) types.Provider {
if task == "code_generation" {
return gpt4o // Best for code
} else if task == "long_document" {
return claude // Best for long docs
} else {
return gpt4oMini // Good enough
}
}
ProviderLatencyThroughputContext
GPT-4o-mini~500msHigh128K
GPT-4o~1sMedium128K
Claude Sonnet~1sMedium200K
Claude Haiku~400msHigh200K
Gemini Flash~600msHigh1M+
Gemini Pro~1.5sMedium1M+
ProviderInput (per 1M tokens)Output (per 1M tokens)
GPT-4o-mini$0.15$0.60
GPT-4o$2.50$10.00
Claude Haiku$0.25$1.25
Claude Sonnet$3.00$15.00
Gemini Flash$0.075$0.30
Gemini Pro$1.25$5.00

GPT-4o-mini: General purpose, cost-effective
GPT-4o: Complex reasoning, code generation
Claude Haiku: Fast responses, high volume
Claude Sonnet: Long documents, analysis
Gemini Flash: Multimodal, cost-effective
Gemini Pro: Very long context, research

  • Cost is primary concern
  • Tasks are straightforward
  • High volume needed
  • Quick responses required
  • Quality is critical
  • Complex reasoning needed
  • Code generation
  • Mathematical tasks
  • Long document analysis
  • Detailed writing
  • Research tasks
  • Need 200K context
  • Speed critical
  • Simple tasks
  • High throughput
  • Cost-effective
  • Multimodal input
  • Cost-effective
  • Good balance
  • Video processing
  • Very long context (1M+ tokens)
  • Research papers
  • Large codebases
  • Book analysis

Close providers

provider, _ := openai.NewOpenAIProvider(...)
defer provider.Close() // Essential

Reuse providers

// Good: One provider, many requests
provider := createProvider()
for _, prompt := range prompts {
provider.Complete(ctx, prompt, config)
}
// Bad: New provider per request
for _, prompt := range prompts {
provider := createProvider()
provider.Complete(ctx, prompt, config)
provider.Close()
}

Handle provider errors

response, err := provider.Complete(ctx, messages, config)
if err != nil {
if errors.Is(err, ErrRateLimited) {
// Wait and retry
} else if errors.Is(err, ErrInvalidKey) {
// Check credentials
} else {
// Fallback provider
}
}

Set context timeouts

ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
response, err := provider.Complete(ctx, messages, config)
import "github.com/AltairaLabs/PromptKit/runtime/providers/mock"
func TestWithMock(t *testing.T) {
mockProvider := mock.NewMockProvider()
mockProvider.SetResponse("Test response")
response, err := mockProvider.Complete(ctx, messages, nil)
assert.NoError(t, err)
assert.Equal(t, "Test response", response.Content)
}
  • No API calls
  • No costs
  • Fast tests
  • Predictable responses
  • Offline testing
type ProviderMetrics struct {
RequestCount int
ErrorCount int
TotalCost float64
TotalTokens int
AvgLatency time.Duration
}
func TrackRequest(provider string, response *ProviderResponse, err error) {
metrics := GetMetrics(provider)
metrics.RequestCount++
if err != nil {
metrics.ErrorCount++
} else {
metrics.TotalCost += response.Cost
metrics.TotalTokens += response.Usage.TotalTokens
}
}
costTracker := middleware.NewCostTracker()
// Use in pipeline
pipe := pipeline.NewPipeline(
middleware.ProviderMiddleware(provider, nil, costTracker, config),
)
// Check costs
fmt.Printf("Total cost: $%.4f\n", costTracker.TotalCost())

Providers are:

Abstracted - Common interface for all LLMs
Flexible - Easy to switch or combine
Configurable - Fine-tune behavior
Testable - Mock for unit tests
Monitorable - Track usage and costs