Providers

Understanding LLM providers in PromptKit.

What is a Provider?

A provider is an LLM service (OpenAI, Anthropic, Google) that generates text responses. PromptKit abstracts providers behind a common interface.

Supported Providers

OpenAI

Models: GPT-4o, GPT-4o-mini, GPT-4-turbo, GPT-3.5-turbo
Features: Function calling, JSON mode, vision, streaming
Pricing: Pay per token, varies by model

Anthropic (Claude)

Models: Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 3 Opus
Features: Function calling, vision, 200K context, streaming
Pricing: Pay per token, varies by model

Google (Gemini)

Models: Gemini 1.5 Pro, Gemini 1.5 Flash
Features: Function calling, multimodal, 1M+ context, streaming
Pricing: Pay per token, free tier available

Ollama (Local)

Models: Llama 3.2, Mistral, LLaVA, DeepSeek, Phi, and more
Features: Function calling, vision (LLaVA), streaming, OpenAI-compatible API
Pricing: Free (local inference, no API costs)

vLLM (High-Performance Inference)

Models: Any HuggingFace model, Llama 3.x, Mistral, Qwen, Phi, LLaVA, and more
Features: Function calling, vision, streaming, guided decoding, beam search, GPU-accelerated high-throughput
Pricing: Free (self-hosted, no API costs)

Platform Support

PromptKit supports running models on cloud hyperscaler platforms in addition to direct API access:

AWS Bedrock

Authentication: Uses AWS SDK credential chain (IRSA, IAM roles, env vars)
Models: Claude models via Anthropic partnership
Benefits: Enterprise security, VPC integration, no API key management

Google Cloud Vertex AI

Authentication: Uses GCP Application Default Credentials (Workload Identity, service accounts)
Models: Claude and Gemini models
Benefits: GCP integration, enterprise compliance, unified billing

Azure AI Foundry

Authentication: Uses Azure AD tokens (Managed Identity, service principals)
Models: OpenAI models via Azure partnership
Benefits: Azure integration, enterprise security, compliance

Why Provider Abstraction?

Problem: Each provider has different APIs

// OpenAI specific
openai.CreateChatCompletion(...)

// Claude specific
anthropic.Messages(...)

// Gemini specific
genai.GenerateContent(...)

Solution: Common interface

// Works with any provider
var provider types.Provider
response, err := provider.Complete(ctx, messages, config)

Provider Interface

type Provider interface {
    Complete(ctx context.Context, messages []Message, config *ProviderConfig) (*ProviderResponse, error)
    CompleteStream(ctx context.Context, messages []Message, config *ProviderConfig) (StreamReader, error)
    GetProviderName() string
    Close() error
}

Using Providers

Basic Usage

// Create provider
provider, err := openai.NewOpenAIProvider(apiKey, "gpt-4o-mini")
if err != nil {
    log.Fatal(err)
}
defer provider.Close()

// Send request
messages := []types.Message{
    {Role: "user", Content: "Hello"},
}
response, err := provider.Complete(ctx, messages, nil)
if err != nil {
    log.Fatal(err)
}

fmt.Println(response.Content)

With Configuration

config := &types.ProviderConfig{
    MaxTokens:   500,
    Temperature: 0.7,
    TopP:        0.9,
}

response, err := provider.Complete(ctx, messages, config)

Streaming

stream, err := provider.CompleteStream(ctx, messages, config)
if err != nil {
    log.Fatal(err)
}
defer stream.Close()

for {
    chunk, err := stream.Recv()
    if err != nil {
        break
    }
    fmt.Print(chunk.Content)
}

Credential Configuration

PromptKit supports flexible credential configuration with a resolution chain:

Resolution Order

Credentials are resolved in the following priority order:

api_key: Explicit API key in configuration
credential_file: Read API key from a file path
credential_env: Read API key from specified environment variable
Default env vars: Fall back to provider-specific defaults (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.)

Configuration Examples

# Explicit API key (not recommended for production)
credential:
  api_key: "sk-..."

# Read from file (good for secrets management)
credential:
  credential_file: /run/secrets/openai-key

# Read from custom env var (useful for multiple providers)
credential:
  credential_env: OPENAI_PROD_API_KEY

Per-Provider Credentials

Configure different credentials for the same provider type:

providers:
  - id: openai-prod
    type: openai
    model: gpt-4o
    credential:
      credential_env: OPENAI_PROD_KEY

  - id: openai-dev
    type: openai
    model: gpt-4o-mini
    credential:
      credential_env: OPENAI_DEV_KEY

Platform Credentials

For cloud platforms, credentials are handled automatically via SDK credential chains:

# AWS Bedrock - uses IRSA, IAM roles, or AWS_* env vars
- id: claude-bedrock
  type: claude
  model: claude-3-5-sonnet-20241022
  platform:
    type: bedrock
    region: us-west-2

# GCP Vertex AI - uses Workload Identity or GOOGLE_APPLICATION_CREDENTIALS
- id: claude-vertex
  type: claude
  model: claude-3-5-sonnet-20241022
  platform:
    type: vertex
    region: us-central1
    project: my-gcp-project

# Azure AI - uses Managed Identity or AZURE_* env vars
- id: gpt4-azure
  type: openai
  model: gpt-4o
  platform:
    type: azure
    endpoint: https://my-resource.openai.azure.com

Provider Configuration

Common Parameters

type ProviderConfig struct {
    MaxTokens     int      // Output limit (default: 4096)
    Temperature   float64  // Randomness 0-2 (default: 1.0)
    TopP          float64  // Nucleus sampling 0-1 (default: 1.0)
    Seed          *int     // Reproducibility (optional)
    StopSequences []string // Stop generation (optional)
}

Temperature

Controls randomness:

0.0: Deterministic, same output
0.5: Balanced
1.0: Creative (default)
2.0: Very random

// Factual tasks
config := &types.ProviderConfig{Temperature: 0.2}

// Creative tasks
config := &types.ProviderConfig{Temperature: 1.2}

Max Tokens

Limits output length:

// Short responses
config := &types.ProviderConfig{MaxTokens: 100}

// Long responses
config := &types.ProviderConfig{MaxTokens: 4096}

Multi-Provider Strategies

Fallback

Try providers in order:

providers := []types.Provider{primary, secondary, tertiary}

var response *types.ProviderResponse
var err error

for _, provider := range providers {
    response, err = provider.Complete(ctx, messages, config)
    if err == nil {
        break  // Success
    }
    log.Printf("Provider %s failed: %v", provider.GetProviderName(), err)
}

Load Balancing

Distribute across providers:

providers := []types.Provider{openai1, openai2, claude}
current := 0

func GetNextProvider() types.Provider {
    provider := providers[current % len(providers)]
    current++
    return provider
}

Cost-Based Routing

Route by cost:

func SelectProvider(complexity string) types.Provider {
    switch complexity {
    case "simple":
        return gpt4oMini  // Cheapest
    case "complex":
        return gpt4o      // Best quality
    case "long_context":
        return gemini     // Largest context
    default:
        return gpt4oMini
    }
}

Quality-Based Routing

Route by quality needs:

func SelectByQuality(task string) types.Provider {
    if task == "code_generation" {
        return gpt4o  // Best for code
    } else if task == "long_document" {
        return claude  // Best for long docs
    } else {
        return gpt4oMini  // Good enough
    }
}

Provider Comparison

Performance

Provider	Latency	Throughput	Context
GPT-4o-mini	~500ms	High	128K
GPT-4o	~1s	Medium	128K
Claude Sonnet	~1s	Medium	200K
Claude Haiku	~400ms	High	200K
Gemini Flash	~600ms	High	1M+
Gemini Pro	~1.5s	Medium	1M+

Cost

Provider	Input (per 1M tokens)	Output (per 1M tokens)
GPT-4o-mini	$0.15	$0.60
GPT-4o	$2.50	$10.00
Claude Haiku	$0.25	$1.25
Claude Sonnet	$3.00	$15.00
Gemini Flash	$0.075	$0.30
Gemini Pro	$1.25	$5.00

Use Cases

GPT-4o-mini: General purpose, cost-effective
GPT-4o: Complex reasoning, code generation
Claude Haiku: Fast responses, high volume
Claude Sonnet: Long documents, analysis
Gemini Flash: Multimodal, cost-effective
Gemini Pro: Very long context, research

Provider Selection Guide

Choose GPT-4o-mini when:

Cost is primary concern
Tasks are straightforward
High volume needed
Quick responses required

Choose GPT-4o when:

Quality is critical
Complex reasoning needed
Code generation
Mathematical tasks

Choose Claude Sonnet when:

Long document analysis
Detailed writing
Research tasks
Need 200K context

Choose Claude Haiku when:

Speed critical
Simple tasks
High throughput
Cost-effective

Choose Gemini Flash when:

Multimodal input
Cost-effective
Good balance
Video processing

Choose Gemini Pro when:

Very long context (1M+ tokens)
Research papers
Large codebases
Book analysis

Best Practices

Resource Management

✅ Close providers

provider, _ := openai.NewOpenAIProvider(...)
defer provider.Close()  // Essential

✅ Reuse providers

// Good: One provider, many requests
provider := createProvider()
for _, prompt := range prompts {
    provider.Complete(ctx, prompt, config)
}

// Bad: New provider per request
for _, prompt := range prompts {
    provider := createProvider()
    provider.Complete(ctx, prompt, config)
    provider.Close()
}

Error Handling

✅ Handle provider errors

response, err := provider.Complete(ctx, messages, config)
if err != nil {
    if errors.Is(err, ErrRateLimited) {
        // Wait and retry
    } else if errors.Is(err, ErrInvalidKey) {
        // Check credentials
    } else {
        // Fallback provider
    }
}

Timeouts

✅ Set context timeouts

ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()

response, err := provider.Complete(ctx, messages, config)

Testing with Providers

Mock Provider

import "github.com/AltairaLabs/PromptKit/runtime/providers/mock"

func TestWithMock(t *testing.T) {
    mockProvider := mock.NewMockProvider()
    mockProvider.SetResponse("Test response")

    response, err := mockProvider.Complete(ctx, messages, nil)
    assert.NoError(t, err)
    assert.Equal(t, "Test response", response.Content)
}

Benefits

No API calls
No costs
Fast tests
Predictable responses
Offline testing

Monitoring Providers

Track Usage

type ProviderMetrics struct {
    RequestCount   int
    ErrorCount     int
    TotalCost      float64
    TotalTokens    int
    AvgLatency     time.Duration
}

func TrackRequest(provider string, response *ProviderResponse, err error) {
    metrics := GetMetrics(provider)
    metrics.RequestCount++

    if err != nil {
        metrics.ErrorCount++
    } else {
        metrics.TotalCost += response.Cost
        metrics.TotalTokens += response.Usage.TotalTokens
    }
}

Monitor Costs

costTracker := middleware.NewCostTracker()

// Use in pipeline
pipe := pipeline.NewPipeline(
    middleware.ProviderMiddleware(provider, nil, costTracker, config),
)

// Check costs
fmt.Printf("Total cost: $%.4f\n", costTracker.TotalCost())

Summary

Providers are:

✅ Abstracted - Common interface for all LLMs
✅ Flexible - Easy to switch or combine
✅ Configurable - Fine-tune behavior
✅ Testable - Mock for unit tests
✅ Monitorable - Track usage and costs

Provider System Explanation - Architecture details
Provider Reference - API documentation
Cloud Provider Examples - Bedrock, Vertex, Azure examples
Multi-Provider Fallback - Implementation guide
Cost Monitoring - Track expenses