Providers
LLM provider implementations with unified API.
Overview
Section titled “Overview”PromptKit supports multiple LLM providers through a common interface:
- OpenAI: GPT-4, GPT-4o, GPT-3.5
- Anthropic Claude: Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku
- Google Gemini: Gemini 1.5 Pro, Gemini 1.5 Flash
- Ollama: Local LLMs (Llama, Mistral, LLaVA, DeepSeek)
- vLLM: High-performance inference (GPU-accelerated, high-throughput)
- Mock: Testing and development
All providers implement the Provider interface for text completion and ToolSupport interface for function calling.
Core Interfaces
Section titled “Core Interfaces”Provider
Section titled “Provider”type Provider interface { ID() string Predict(ctx context.Context, req PredictionRequest) (PredictionResponse, error)
// Streaming PredictStream(ctx context.Context, req PredictionRequest) (<-chan StreamChunk, error) SupportsStreaming() bool
ShouldIncludeRawOutput() bool Close() error
// Cost calculation CalculateCost(inputTokens, outputTokens, cachedTokens int) types.CostInfo}ToolSupport
Section titled “ToolSupport”type ToolSupport interface { Provider
// Convert tools to provider format BuildTooling(descriptors []*ToolDescriptor) (interface{}, error)
// Execute with tools PredictWithTools( ctx context.Context, req PredictionRequest, tools interface{}, toolChoice string, ) (PredictionResponse, []types.MessageToolCall, error)}MultimodalSupport
Section titled “MultimodalSupport”Providers that support images, audio, or video inputs implement MultimodalSupport:
type MultimodalSupport interface { Provider
// Get supported multimodal capabilities GetMultimodalCapabilities() MultimodalCapabilities
// Execute with multimodal content PredictMultimodal(ctx context.Context, req PredictionRequest) (PredictionResponse, error)
// Stream with multimodal content PredictMultimodalStream(ctx context.Context, req PredictionRequest) (<-chan StreamChunk, error)}MultimodalCapabilities:
type MultimodalCapabilities struct { SupportsImages bool // Can process image inputs SupportsAudio bool // Can process audio inputs SupportsVideo bool // Can process video inputs ImageFormats []string // Supported image MIME types AudioFormats []string // Supported audio MIME types VideoFormats []string // Supported video MIME types MaxImageSizeMB int // Max image size (0 = unlimited/unknown) MaxAudioSizeMB int // Max audio size (0 = unlimited/unknown) MaxVideoSizeMB int // Max video size (0 = unlimited/unknown)}Provider Multimodal Support:
| Provider | Images | Audio | Video | Notes |
|---|---|---|---|---|
| OpenAI GPT-4o/4o-mini | ✅ | ❌ | ❌ | JPEG, PNG, GIF, WebP |
| Anthropic Claude 3.5 | ✅ | ❌ | ❌ | JPEG, PNG, GIF, WebP |
| Google Gemini 1.5 | ✅ | ✅ | ✅ | Full multimodal support |
| Ollama (LLaVA, Llama 3.2 Vision) | ✅ | ❌ | ❌ | JPEG, PNG, GIF, WebP |
| vLLM (LLaVA, vision models) | ✅ | ❌ | ❌ | JPEG, PNG, GIF, WebP, 20MB limit |
Helper Functions:
// Check if provider supports multimodalfunc SupportsMultimodal(p Provider) bool
// Get multimodal provider (returns nil if not supported)func GetMultimodalProvider(p Provider) MultimodalSupport
// Check specific media type supportfunc HasImageSupport(p Provider) boolfunc HasAudioSupport(p Provider) boolfunc HasVideoSupport(p Provider) bool
// Check format compatibilityfunc IsFormatSupported(p Provider, contentType string, mimeType string) bool
// Validate message compatibilityfunc ValidateMultimodalMessage(p Provider, msg types.Message) errorUsage Example:
// Check capabilitiesif providers.HasImageSupport(provider) { caps := providers.GetMultimodalProvider(provider).GetMultimodalCapabilities() fmt.Printf("Max image size: %d MB\n", caps.MaxImageSizeMB)}
// Send multimodal requestreq := providers.PredictionRequest{ System: "You are a helpful assistant.", Messages: []types.Message{ { Role: "user", Parts: []types.ContentPart{ {Type: "text", Text: "What's in this image?"}, { Type: "image", Media: &types.MediaContent{ Type: "image", MIMEType: "image/jpeg", Data: imageBase64, }, }, }, }, },}
if mp := providers.GetMultimodalProvider(provider); mp != nil { resp, err := mp.PredictMultimodal(ctx, req)}MultimodalToolSupport
Section titled “MultimodalToolSupport”Providers that support both multimodal content and function calling implement MultimodalToolSupport:
type MultimodalToolSupport interface { MultimodalSupport ToolSupport
// Execute with both multimodal content and tools PredictMultimodalWithTools( ctx context.Context, req PredictionRequest, tools interface{}, toolChoice string, ) (PredictionResponse, []types.MessageToolCall, error)}Usage Example:
// Use images with tool callstools, _ := provider.BuildTooling(toolDescriptors)
resp, toolCalls, err := provider.PredictMultimodalWithTools( ctx, multimodalRequest, tools, "auto",)
// Response contains both text and any tool callsfmt.Println(resp.Content)for _, call := range toolCalls { fmt.Printf("Tool called: %s\n", call.Name)}Request/Response Types
Section titled “Request/Response Types”PredictionRequest
Section titled “PredictionRequest”type PredictionRequest struct { System string Messages []types.Message Temperature float32 TopP float32 MaxTokens int Seed *int Metadata map[string]interface{}}PredictionResponse
Section titled “PredictionResponse”type PredictionResponse struct { Content string Parts []types.ContentPart // Multimodal content CostInfo *types.CostInfo Latency time.Duration Raw []byte // Raw API response RawRequest interface{} // Raw API request ToolCalls []types.MessageToolCall}ProviderDefaults
Section titled “ProviderDefaults”type ProviderDefaults struct { Temperature float32 TopP float32 MaxTokens int Pricing Pricing}Pricing
Section titled “Pricing”type Pricing struct { InputCostPer1K float64 // Per 1K input tokens OutputCostPer1K float64 // Per 1K output tokens}OpenAI Provider
Section titled “OpenAI Provider”Constructor
Section titled “Constructor”func NewOpenAIProvider( id string, model string, baseURL string, defaults ProviderDefaults, includeRawOutput bool,) *OpenAIProviderParameters:
id: Provider identifier (e.g., “openai-gpt4”)model: Model name (e.g., “gpt-4o-mini”, “gpt-4-turbo”)baseURL: Custom API URL (empty for defaulthttps://api.openai.com/v1)defaults: Default parameters and pricingincludeRawOutput: Include raw API response in output
Environment:
OPENAI_API_KEY: Required API key
Example:
provider := openai.NewOpenAIProvider( "openai", "gpt-4o-mini", "", // Use default URL openai.DefaultProviderDefaults(), false,)defer provider.Close()Supported Models
Section titled “Supported Models”| Model | Context | Cost (Input/Output per 1M tokens) |
|---|---|---|
gpt-4o | 128K | $2.50 / $10.00 |
gpt-4o-mini | 128K | $0.15 / $0.60 |
gpt-4-turbo | 128K | $10.00 / $30.00 |
gpt-4 | 8K | $30.00 / $60.00 |
gpt-3.5-turbo | 16K | $0.50 / $1.50 |
Features
Section titled “Features”- ✅ Streaming support
- ✅ Function calling
- ✅ Multimodal (vision)
- ✅ JSON mode
- ✅ Seed for reproducibility
- ✅ Token counting
Tool Support
Section titled “Tool Support”// Create tool providertoolProvider := openai.NewOpenAIToolProvider( "openai", "gpt-4o-mini", "", openai.DefaultProviderDefaults(), false, nil, // Additional config)
// Build tools in OpenAI formattools, err := toolProvider.BuildTooling(toolDescriptors)if err != nil { log.Fatal(err)}
// Execute with toolsresponse, toolCalls, err := toolProvider.PredictWithTools( ctx, req, tools, "auto", // Tool choice: "auto", "required", "none", or specific tool)Anthropic Claude Provider
Section titled “Anthropic Claude Provider”Constructor
Section titled “Constructor”func NewClaudeProvider( id string, model string, baseURL string, defaults ProviderDefaults, includeRawOutput bool,) *ClaudeProviderEnvironment:
ANTHROPIC_API_KEY: Required API key
Example:
provider := claude.NewClaudeProvider( "claude", "claude-3-5-sonnet-20241022", "", // Use default URL claude.DefaultProviderDefaults(), false,)defer provider.Close()Supported Models
Section titled “Supported Models”| Model | Context | Cost (Input/Output per 1M tokens) |
|---|---|---|
claude-3-5-sonnet-20241022 | 200K | $3.00 / $15.00 |
claude-3-opus-20240229 | 200K | $15.00 / $75.00 |
claude-3-haiku-20240307 | 200K | $0.25 / $1.25 |
Features
Section titled “Features”- ✅ Streaming support
- ✅ Tool calling
- ✅ Multimodal (vision)
- ✅ Extended context (200K tokens)
- ✅ Prompt caching
- ✅ System prompts
Tool Support
Section titled “Tool Support”toolProvider := claude.NewClaudeToolProvider( "claude", "claude-3-5-sonnet-20241022", "", claude.DefaultProviderDefaults(), false,)
response, toolCalls, err := toolProvider.PredictWithTools(ctx, req, tools, "auto")Google Gemini Provider
Section titled “Google Gemini Provider”Constructor
Section titled “Constructor”func NewGeminiProvider( id string, model string, baseURL string, defaults ProviderDefaults, includeRawOutput bool,) *GeminiProviderEnvironment:
GEMINI_API_KEY: Required API key
Example:
provider := gemini.NewGeminiProvider( "gemini", "gemini-1.5-flash", "", gemini.DefaultProviderDefaults(), false,)defer provider.Close()Supported Models
Section titled “Supported Models”| Model | Context | Cost (Input/Output per 1M tokens) |
|---|---|---|
gemini-1.5-pro | 2M | $1.25 / $5.00 |
gemini-1.5-flash | 1M | $0.075 / $0.30 |
Features
Section titled “Features”- ✅ Streaming support
- ✅ Function calling
- ✅ Multimodal (vision, audio, video)
- ✅ Extended context (up to 2M tokens)
- ✅ Grounding with Google Search
Tool Support
Section titled “Tool Support”toolProvider := gemini.NewGeminiToolProvider( "gemini", "gemini-1.5-flash", "", gemini.DefaultProviderDefaults(), false,)
response, toolCalls, err := toolProvider.PredictWithTools(ctx, req, tools, "auto")Mock Provider
Section titled “Mock Provider”For testing and development.
Constructor
Section titled “Constructor”func NewMockProvider( id string, model string, includeRawOutput bool,) *MockProviderExample:
provider := mock.NewMockProvider("mock", "test-model", false)
// Configure responsesprovider.AddResponse("Hello", "Hi there!")provider.AddResponse("What is 2+2?", "4")With Repository
Section titled “With Repository”// Custom response repositoryrepo := &CustomMockRepository{ responses: map[string]string{ "hello": "Hello! How can I help?", "bye": "Goodbye!", },}
provider := mock.NewMockProviderWithRepository("mock", "test-model", false, repo)Tool Support
Section titled “Tool Support”toolProvider := mock.NewMockToolProvider("mock", "test-model", false, nil)
// Configure tool call responsestoolProvider.ConfigureToolResponse("get_weather", `{"temp": 72, "conditions": "sunny"}`)Ollama Provider
Section titled “Ollama Provider”Run local LLMs with zero API costs using Ollama. Uses the OpenAI-compatible /v1/chat/completions endpoint.
Constructor
Section titled “Constructor”func NewOllamaProvider( id string, model string, baseURL string, defaults ProviderDefaults, includeRawOutput bool, additionalConfig map[string]interface{},) *OllamaProviderParameters:
id: Provider identifier (e.g., “ollama-llama”)model: Model name (e.g., “llama3.2:1b”, “mistral”, “llava”)baseURL: Ollama server URL (default:http://localhost:11434)defaults: Default parameters and pricing (typically zero cost)includeRawOutput: Include raw API response in outputadditionalConfig: Extra options includingkeep_alivefor model persistence
Environment:
- No API key required (local inference)
OLLAMA_HOST: Optional, alternative tobaseURLparameter
Example:
provider := ollama.NewOllamaProvider( "ollama", "llama3.2:1b", "http://localhost:11434", ollama.DefaultProviderDefaults(), false, map[string]interface{}{ "keep_alive": "5m", // Keep model loaded for 5 minutes },)defer provider.Close()Supported Models
Section titled “Supported Models”Any model available via ollama pull. Common models include:
| Model | Context | Cost |
|---|---|---|
llama3.2:1b | 128K | Free (local) |
llama3.2:3b | 128K | Free (local) |
llama3.1:8b | 128K | Free (local) |
mistral | 32K | Free (local) |
deepseek-r1:8b | 64K | Free (local) |
phi3:mini | 128K | Free (local) |
llava | 4K | Free (local) |
llama3.2-vision | 128K | Free (local) |
Run ollama list to see installed models, or ollama pull <model> to download new ones.
Features
Section titled “Features”- ✅ Streaming support
- ✅ Function calling (tool use)
- ✅ Multimodal (vision) - LLaVA, Llama 3.2 Vision
- ✅ Zero cost (local inference)
- ✅ Model persistence (
keep_aliveparameter) - ✅ OpenAI-compatible API
- ❌ No API key required
Tool Support
Section titled “Tool Support”toolProvider := ollama.NewOllamaToolProvider( "ollama", "llama3.2:1b", "http://localhost:11434", ollama.DefaultProviderDefaults(), false, map[string]interface{}{"keep_alive": "5m"},)
// Build tools in OpenAI formattools, err := toolProvider.BuildTooling(toolDescriptors)if err != nil { log.Fatal(err)}
// Execute with toolsresponse, toolCalls, err := toolProvider.PredictWithTools( ctx, req, tools, "auto", // Tool choice: "auto", "required", "none")Configuration via YAML
Section titled “Configuration via YAML”spec: id: "ollama-llama" type: ollama model: llama3.2:1b base_url: "http://localhost:11434" additional_config: keep_alive: "5m"Docker Setup
Section titled “Docker Setup”Run Ollama with Docker Compose:
services: ollama: image: ollama/ollama:latest ports: - "11434:11434" volumes: - ollama_data:/root/.ollama healthcheck: test: ["CMD-SHELL", "ollama list || exit 1"] interval: 10s timeout: 30s retries: 5 start_period: 30s
volumes: ollama_data:Then pull a model:
docker exec ollama ollama pull llama3.2:1bvLLM Provider
Section titled “vLLM Provider”High-throughput inference engine optimized for GPU-accelerated LLM serving. Supports guided decoding, advanced sampling strategies, and efficient batching for maximum performance.
Constructor
Section titled “Constructor”func NewVLLMProvider( id string, model string, baseURL string, defaults ProviderDefaults, includeRawOutput bool, additionalConfig map[string]interface{},) *VLLMProviderParameters:
id: Provider identifier (e.g., “vllm-llama”)model: Model name served by vLLM (e.g., “meta-llama/Llama-3.2-1B-Instruct”)baseURL: vLLM server URL (default:http://localhost:8000)defaults: Default parameters and pricing (typically zero cost for self-hosted)includeRawOutput: Include raw API response in outputadditionalConfig: vLLM-specific options (guided decoding, beam search, etc.)
Environment:
- No API key required (self-hosted inference)
VLLM_API_KEY: Optional API key if vLLM server configured with authentication
Example:
provider := vllm.NewVLLMProvider( "vllm", "meta-llama/Llama-3.2-1B-Instruct", "http://localhost:8000", vllm.DefaultProviderDefaults(), false, map[string]interface{}{ "use_beam_search": false, "best_of": 1, },)defer provider.Close()Supported Models
Section titled “Supported Models”Any model supported by vLLM from HuggingFace. Common models include:
| Model | Context | Features | Cost |
|---|---|---|---|
meta-llama/Llama-3.2-1B-Instruct | 128K | Fast, compact | Free (self-hosted) |
meta-llama/Llama-3.2-3B-Instruct | 128K | Balanced | Free (self-hosted) |
meta-llama/Llama-3.1-8B-Instruct | 128K | High quality | Free (self-hosted) |
mistralai/Mistral-7B-Instruct-v0.3 | 32K | Good performance | Free (self-hosted) |
Qwen/Qwen2.5-7B-Instruct | 128K | Multilingual | Free (self-hosted) |
microsoft/Phi-3-mini-128k-instruct | 128K | Efficient | Free (self-hosted) |
llava-hf/llava-v1.6-mistral-7b-hf | 4K | Vision support | Free (self-hosted) |
Check vLLM documentation for full model compatibility list.
Features
Section titled “Features”- ✅ Streaming support (SSE)
- ✅ Function calling (tool use)
- ✅ Multimodal (vision) - LLaVA and vision-capable models
- ✅ GPU acceleration (CUDA, ROCm)
- ✅ Guided decoding (JSON schema, regex, grammar)
- ✅ Beam search for higher quality
- ✅ Tensor parallelism for large models
- ✅ PagedAttention for memory efficiency
- ✅ Continuous batching for throughput
- ✅ OpenAI-compatible API
- ✅ Zero API costs (self-hosted)
- ❌ No API key required
vLLM-Specific Parameters
Section titled “vLLM-Specific Parameters”Additional configuration options beyond standard OpenAI parameters:
additionalConfig := map[string]interface{}{ // Sampling & Quality "use_beam_search": false, // Enable beam search (slower, higher quality) "best_of": 1, // Generate N candidates, return best "ignore_eos": false, // Continue past EOS token
// Guided Decoding "guided_json": jsonSchema, // Force JSON output matching schema "guided_regex": "^[0-9]+$", // Force output matching regex "guided_choice": []string{"yes", "no"}, // Force choice from options "guided_grammar": bnfGrammar, // Force output matching BNF grammar "guided_whitespace_pattern": nil, // Custom whitespace handling}Example with Guided JSON:
// Force JSON output matching schemajsonSchema := `{ "type": "object", "properties": { "name": {"type": "string"}, "age": {"type": "integer"} }, "required": ["name", "age"]}`
provider := vllm.NewVLLMProvider( "vllm", "meta-llama/Llama-3.2-3B-Instruct", "http://localhost:8000", vllm.DefaultProviderDefaults(), false, map[string]interface{}{ "guided_json": jsonSchema, },)Tool Support
Section titled “Tool Support”toolProvider := vllm.NewVLLMToolProvider( "vllm", "meta-llama/Llama-3.2-1B-Instruct", "http://localhost:8000", vllm.DefaultProviderDefaults(), false, nil,)
// Build tools in OpenAI formattools, err := toolProvider.BuildTooling(toolDescriptors)if err != nil { log.Fatal(err)}
// Execute with toolsresponse, toolCalls, err := toolProvider.PredictWithTools( ctx, req, tools, "auto", // Tool choice: "auto", "required", "none", or specific tool name)Configuration via YAML
Section titled “Configuration via YAML”spec: id: "vllm-llama" type: vllm model: meta-llama/Llama-3.2-3B-Instruct base_url: "http://localhost:8000" additional_config: use_beam_search: false best_of: 1Docker Setup
Section titled “Docker Setup”Run vLLM with Docker:
# CPU-only (for testing)docker run --rm -it \ -p 8000:8000 \ vllm/vllm-openai:latest \ --model meta-llama/Llama-3.2-1B-Instruct \ --max-model-len 2048
# GPU-accelerated (recommended for production)docker run --rm -it \ --gpus all \ -p 8000:8000 \ vllm/vllm-openai:latest \ --model meta-llama/Llama-3.2-3B-Instruct \ --dtype half \ --max-model-len 4096Docker Compose:
services: vllm: image: vllm/vllm-openai:latest ports: - "8000:8000" volumes: - vllm_cache:/root/.cache/huggingface command: - --model - meta-llama/Llama-3.2-3B-Instruct - --dtype - half - --max-model-len - "4096" deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu]
volumes: vllm_cache:Performance Tuning
Section titled “Performance Tuning”Tensor Parallelism (multi-GPU):
docker run --gpus all -p 8000:8000 \ vllm/vllm-openai:latest \ --model meta-llama/Llama-3.1-8B-Instruct \ --tensor-parallel-size 2 # Split across 2 GPUsMemory Optimization:
docker run --gpus all -p 8000:8000 \ vllm/vllm-openai:latest \ --model meta-llama/Llama-3.2-3B-Instruct \ --gpu-memory-utilization 0.9 # Use 90% of GPU memory --max-model-len 8192Quantization (reduce memory):
docker run --gpus all -p 8000:8000 \ vllm/vllm-openai:latest \ --model meta-llama/Llama-3.1-8B-Instruct \ --quantization awq # or 'gptq', 'squeezellm'Comparison: vLLM vs Ollama
Section titled “Comparison: vLLM vs Ollama”| Feature | vLLM | Ollama |
|---|---|---|
| Target Use Case | High-performance GPU inference | Local development, ease of use |
| GPU Acceleration | Required (CUDA/ROCm) | Optional |
| Throughput | Very high (continuous batching) | Moderate |
| Model Loading | HuggingFace models directly | ollama pull model management |
| Guided Decoding | ✅ JSON schema, regex, grammar | ❌ |
| Beam Search | ✅ | ❌ |
| Tensor Parallelism | ✅ Multi-GPU support | ❌ |
| Quantization | ✅ AWQ, GPTQ, SqueezeLLM | ✅ GGUF format |
| API Compatibility | OpenAI-compatible | OpenAI-compatible |
| Setup Complexity | Moderate (GPU drivers, Docker) | Low (single binary) |
| Memory Efficiency | PagedAttention | Standard |
| Cost | Free (self-hosted) | Free (self-hosted) |
When to use vLLM:
- GPU-accelerated inference for performance
- Multi-GPU setups for large models
- Need structured output (guided decoding)
- Batch processing workloads requiring high throughput
- Advanced sampling strategies (beam search)
When to use Ollama:
- Local development and testing
- CPU-only environments
- Quick model experimentation
- Simpler setup requirements
Usage Examples
Section titled “Usage Examples”Basic Completion
Section titled “Basic Completion”import ( "context" "github.com/AltairaLabs/PromptKit/runtime/providers/openai")
provider := openai.NewOpenAIProvider( "openai", "gpt-4o-mini", "", openai.DefaultProviderDefaults(), false,)defer provider.Close()
req := providers.PredictionRequest{ System: "You are a helpful assistant.", Messages: []types.Message{ {Role: "user", Content: "What is 2+2?"}, }, Temperature: 0.7, MaxTokens: 100,}
ctx := context.Background()response, err := provider.Predict(ctx, req)if err != nil { log.Fatal(err)}
fmt.Printf("Response: %s\n", response.Content)fmt.Printf("Cost: $%.6f\n", response.CostInfo.TotalCost)fmt.Printf("Latency: %v\n", response.Latency)Streaming Completion
Section titled “Streaming Completion”streamChan, err := provider.PredictStream(ctx, req)if err != nil { log.Fatal(err)}
var fullContent stringfor chunk := range streamChan { if chunk.Error != nil { log.Printf("Stream error: %v\n", chunk.Error) break }
if chunk.Delta != "" { fullContent += chunk.Delta fmt.Print(chunk.Delta) }
if chunk.Done { fmt.Printf("\n\nComplete! Tokens: %d\n", chunk.TokenCount) }}With Function Calling
Section titled “With Function Calling”toolProvider := openai.NewOpenAIToolProvider( "openai", "gpt-4o-mini", "", openai.DefaultProviderDefaults(), false, nil,)
// Define toolstoolDescs := []*providers.ToolDescriptor{ { Name: "get_weather", Description: "Get current weather for a location", InputSchema: json.RawMessage(`{ "type": "object", "properties": { "location": {"type": "string", "description": "City name"} }, "required": ["location"] }`), },}
// Build tools in provider formattools, err := toolProvider.BuildTooling(toolDescs)if err != nil { log.Fatal(err)}
// Execute with toolsreq.Messages = []types.Message{ {Role: "user", Content: "What's the weather in San Francisco?"},}
response, toolCalls, err := toolProvider.PredictWithTools(ctx, req, tools, "auto")if err != nil { log.Fatal(err)}
// Process tool callsfor _, call := range toolCalls { fmt.Printf("Tool: %s\n", call.Name) fmt.Printf("Args: %s\n", call.Arguments)}Multimodal (Vision)
Section titled “Multimodal (Vision)”// Create message with imagemsg := types.Message{ Role: "user", Content: "What's in this image?", Parts: []types.ContentPart{ { Type: "image", ImageURL: &types.ImageURL{ URL: "...", }, }, },}
req.Messages = []types.Message{msg}response, err := provider.Predict(ctx, req)Cost Calculation
Section titled “Cost Calculation”// Manual cost calculationcostInfo := provider.CalculateCost( 1000, // Input tokens 500, // Output tokens 0, // Cached tokens)
fmt.Printf("Input cost: $%.6f\n", costInfo.InputCost)fmt.Printf("Output cost: $%.6f\n", costInfo.OutputCost)fmt.Printf("Total cost: $%.6f\n", costInfo.TotalCost)Custom Provider Configuration
Section titled “Custom Provider Configuration”// Custom pricingcustomDefaults := providers.ProviderDefaults{ Temperature: 0.8, TopP: 0.95, MaxTokens: 2000, Pricing: providers.Pricing{ InputCostPer1K: 0.0001, OutputCostPer1K: 0.0002, },}
provider := openai.NewOpenAIProvider( "custom-openai", "gpt-4o-mini", "", customDefaults, true, // Include raw output)Configuration
Section titled “Configuration”Default Provider Settings
Section titled “Default Provider Settings”OpenAI:
func DefaultProviderDefaults() ProviderDefaults { return ProviderDefaults{ Temperature: 0.7, TopP: 1.0, MaxTokens: 2000, Pricing: Pricing{ InputCostPer1K: 0.00015, // gpt-4o-mini OutputCostPer1K: 0.0006, }, }}Claude:
func DefaultProviderDefaults() ProviderDefaults { return ProviderDefaults{ Temperature: 0.7, TopP: 1.0, MaxTokens: 4096, Pricing: Pricing{ InputCostPer1K: 0.003, // claude-3-5-sonnet OutputCostPer1K: 0.015, }, }}Gemini:
func DefaultProviderDefaults() ProviderDefaults { return ProviderDefaults{ Temperature: 0.7, TopP: 0.95, MaxTokens: 8192, Pricing: Pricing{ InputCostPer1K: 0.000075, // gemini-1.5-flash OutputCostPer1K: 0.0003, }, }}Ollama:
func DefaultProviderDefaults() ProviderDefaults { return ProviderDefaults{ Temperature: 0.7, TopP: 0.9, MaxTokens: 2048, Pricing: Pricing{ InputCostPer1K: 0.0, // Local inference - free OutputCostPer1K: 0.0, }, }}vLLM:
func DefaultProviderDefaults() ProviderDefaults { return ProviderDefaults{ Temperature: 0.7, TopP: 0.95, MaxTokens: 2048, Pricing: Pricing{ InputCostPer1K: 0.0, // Self-hosted inference - free OutputCostPer1K: 0.0, }, }}Environment Variables
Section titled “Environment Variables”All providers support environment variable configuration:
OPENAI_API_KEY: OpenAI authenticationANTHROPIC_API_KEY: Anthropic authenticationGEMINI_API_KEY: Google Gemini authenticationOPENAI_BASE_URL: Custom OpenAI-compatible endpointANTHROPIC_BASE_URL: Custom Claude endpointGEMINI_BASE_URL: Custom Gemini endpointOLLAMA_HOST: Ollama server URL (default:http://localhost:11434)VLLM_API_KEY: Optional vLLM authentication (if server configured with auth)VLLM_BASE_URL: vLLM server URL (default:http://localhost:8000)
Credential System
Section titled “Credential System”PromptKit provides a flexible credential system for authenticating with LLM providers.
Credential Interface
Section titled “Credential Interface”type Credential interface { // Apply adds authentication to an HTTP request Apply(ctx context.Context, req *http.Request) error
// Type returns the credential type identifier Type() string}Credential Types
Section titled “Credential Types”APIKeyCredential
Section titled “APIKeyCredential”Standard API key authentication (most providers):
cred := credentials.NewAPIKeyCredential("sk-your-key", credentials.WithHeaderName("Authorization"), // Default credentials.WithPrefix("Bearer "), // Default)AWSCredential
Section titled “AWSCredential”AWS SigV4 signing for Bedrock:
cred, err := credentials.NewAWSCredential(ctx, "us-west-2")// Uses AWS SDK default credential chain:// - IRSA (EKS)// - IAM instance roles// - AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEYGCPCredential
Section titled “GCPCredential”GCP OAuth tokens for Vertex AI:
cred, err := credentials.NewGCPCredential(ctx, "us-central1", "my-project")// Uses Application Default Credentials:// - Workload Identity (GKE)// - Service account keys// - GOOGLE_APPLICATION_CREDENTIALSAzureCredential
Section titled “AzureCredential”Azure AD tokens for Azure AI:
cred, err := credentials.NewAzureCredential(ctx, "https://my-resource.openai.azure.com")// Uses Azure SDK default credential chain:// - Managed Identity// - Azure CLI credentials// - AZURE_CLIENT_ID / AZURE_TENANT_ID / AZURE_CLIENT_SECRETCredential Resolution
Section titled “Credential Resolution”The credentials.Resolve() function resolves credentials based on configuration:
import "github.com/AltairaLabs/PromptKit/runtime/credentials"
cfg := credentials.ResolverConfig{ ProviderType: "openai", CredentialConfig: &config.CredentialConfig{ CredentialEnv: "MY_OPENAI_KEY", // Custom env var },}
cred, err := credentials.Resolve(ctx, cfg)if err != nil { log.Fatal(err)}Resolution Order:
api_key- Direct API key valuecredential_file- Read from file pathcredential_env- Read from specified env var- Default env vars - Provider-specific defaults
Default Environment Variables:
- OpenAI:
OPENAI_API_KEY,OPENAI_TOKEN - Claude:
ANTHROPIC_API_KEY,CLAUDE_API_KEY - Gemini:
GEMINI_API_KEY,GOOGLE_API_KEY
Platform Configuration
Section titled “Platform Configuration”Platforms are hosting layers that wrap provider APIs with different authentication:
type PlatformConfig struct { Type string // bedrock, vertex, azure Region string // AWS/GCP region Project string // GCP project ID Endpoint string // Custom endpoint URL AdditionalConfig map[string]interface{} // Platform-specific options}AWS Bedrock
Section titled “AWS Bedrock”platform: type: bedrock region: us-west-2Model names are automatically mapped:
claude-3-5-sonnet-20241022→anthropic.claude-3-5-sonnet-20241022-v2:0
GCP Vertex AI
Section titled “GCP Vertex AI”platform: type: vertex region: us-central1 project: my-gcp-projectAzure AI Foundry
Section titled “Azure AI Foundry”platform: type: azure endpoint: https://my-resource.openai.azure.comBest Practices
Section titled “Best Practices”1. Resource Management
Section titled “1. Resource Management”// Always close providersprovider := openai.NewOpenAIProvider(...)defer provider.Close()2. Error Handling
Section titled “2. Error Handling”response, err := provider.Predict(ctx, req)if err != nil { // Check for specific error types if strings.Contains(err.Error(), "rate_limit_exceeded") { // Implement backoff time.Sleep(time.Second * 5) return retry() } return err}3. Cost Monitoring
Section titled “3. Cost Monitoring”// Track costs across requestsvar totalCost float64for _, result := range results { totalCost += result.CostInfo.TotalCost}fmt.Printf("Total spend: $%.6f\n", totalCost)4. Timeout Management
Section titled “4. Timeout Management”// Use context timeoutctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)defer cancel()
response, err := provider.Predict(ctx, req)5. Streaming Best Practices
Section titled “5. Streaming Best Practices”// Always drain channelstreamChan, err := provider.PredictStream(ctx, req)if err != nil { return err}
for chunk := range streamChan { if chunk.Error != nil { // Handle error but continue draining log.Printf("Error: %v", chunk.Error) continue } processChunk(chunk)}Performance Considerations
Section titled “Performance Considerations”Latency
Section titled “Latency”- OpenAI: 200-500ms TTFT, 1-3s total for short responses
- Claude: 300-600ms TTFT, similar total latency
- Gemini: 150-400ms TTFT, faster for simple queries
Throughput
Section titled “Throughput”- Rate limits: Vary by provider and tier
- OpenAI: 3,500-10,000 RPM
- Claude: 4,000-50,000 RPM
- Gemini: 2,000-15,000 RPM
Cost Optimization
Section titled “Cost Optimization”- Use mini/flash models for simple tasks
- Implement caching for repeated queries
- Use streaming for better UX (doesn’t reduce cost)
- Monitor token usage and set appropriate
MaxTokens
See Also
Section titled “See Also”- MediaLoader - Unified media loading
- Pipeline Reference - Using providers in pipelines
- Tools Reference - Function calling
- Provider How-To - Configuration guide
- Provider Explanation - Architecture details
MediaLoader
Section titled “MediaLoader”Unified interface for loading media content from various sources (inline data, storage references, file paths, URLs).
Overview
Section titled “Overview”MediaLoader abstracts media access, allowing providers to load media transparently regardless of where it’s stored. This is essential for media externalization, where large media is stored on disk instead of being kept in memory.
import "github.com/AltairaLabs/PromptKit/runtime/providers"
loader := providers.NewMediaLoader(providers.MediaLoaderConfig{ StorageService: fileStore, HTTPTimeout: 30 * time.Second, MaxURLSizeBytes: 50 * 1024 * 1024, // 50 MB})Type Definition
Section titled “Type Definition”type MediaLoader struct { // Unexported fields}Constructor
Section titled “Constructor”NewMediaLoader
Section titled “NewMediaLoader”Creates a new MediaLoader with the specified configuration.
func NewMediaLoader(config MediaLoaderConfig) *MediaLoaderParameters:
config- MediaLoaderConfig with storage service and options
Returns:
*MediaLoader- Ready-to-use media loader
Example:
import ( "github.com/AltairaLabs/PromptKit/runtime/providers" "github.com/AltairaLabs/PromptKit/runtime/storage/local")
fileStore := local.NewFileStore(local.FileStoreConfig{ BaseDir: "./media",})
loader := providers.NewMediaLoader(providers.MediaLoaderConfig{ StorageService: fileStore, HTTPTimeout: 30 * time.Second, MaxURLSizeBytes: 50 * 1024 * 1024,})Configuration
Section titled “Configuration”MediaLoaderConfig
Section titled “MediaLoaderConfig”Configuration for MediaLoader instances.
type MediaLoaderConfig struct { StorageService storage.MediaStorageService // Required for storage references HTTPTimeout time.Duration // Timeout for URL fetches MaxURLSizeBytes int64 // Max size for URL content}Fields:
StorageService - Media storage backend
- Required if loading from storage references
- Typically a FileStore or cloud storage backend
- Set to nil if not using media externalization
HTTPTimeout - HTTP request timeout for URLs
- Default: 30 seconds
- Applies to URL fetches only
- Set to 0 for no timeout
MaxURLSizeBytes - Maximum size for URL content
- Default: 50 MB
- Prevents downloading huge files
- Returns error if content larger
Methods
Section titled “Methods”GetBase64Data
Section titled “GetBase64Data”Loads media content from any source and returns base64-encoded data.
func (l *MediaLoader) GetBase64Data( ctx context.Context, media *types.MediaContent,) (string, error)Parameters:
ctx- Context for cancellation and timeoutmedia- MediaContent with one or more sources
Returns:
string- Base64-encoded media dataerror- Load errors (not found, timeout, size limit, etc.)
Source Priority:
Media is loaded from the first available source in this order:
- Data - Inline base64 data (if present)
- StorageReference - External storage (requires StorageService)
- FilePath - Local file system path
- URL - HTTP/HTTPS URL (with timeout and size limits)
Example:
// Load from any sourcedata, err := loader.GetBase64Data(ctx, media)if err != nil { log.Printf("Failed to load media: %v", err) return err}
// Use the datafmt.Printf("Loaded %d bytes\n", len(data))Usage Examples
Section titled “Usage Examples”Basic Usage
Section titled “Basic Usage”// Media with inline datamedia := &types.MediaContent{ Type: "image", MimeType: "image/png", Data: "iVBORw0KGgoAAAANSUhEUg...", // Base64}
data, err := loader.GetBase64Data(ctx, media)// Returns media.Data immediately (already inline)Load from Storage
Section titled “Load from Storage”// Media externalized to storagemedia := &types.MediaContent{ Type: "image", MimeType: "image/png", StorageReference: &storage.StorageReference{ ID: "abc123-def456-ghi789", Backend: "file", },}
data, err := loader.GetBase64Data(ctx, media)// Loads from disk via StorageServiceLoad from File Path
Section titled “Load from File Path”// Media from local filemedia := &types.MediaContent{ Type: "image", MimeType: "image/jpeg", FilePath: "/path/to/image.jpg",}
data, err := loader.GetBase64Data(ctx, media)// Reads file and converts to base64Load from URL
Section titled “Load from URL”// Media from HTTP URLmedia := &types.MediaContent{ Type: "image", MimeType: "image/png", URL: "https://example.com/image.png",}
data, err := loader.GetBase64Data(ctx, media)// Fetches URL with timeout and size checksProvider Integration
Section titled “Provider Integration”// Provider using MediaLoadertype MyProvider struct { mediaLoader *providers.MediaLoader}
func (p *MyProvider) Predict(ctx context.Context, req providers.PredictionRequest) (providers.PredictionResponse, error) { // Load media from messages for _, msg := range req.Messages { for _, content := range msg.Content { if content.Media != nil { // Load media transparently data, err := p.mediaLoader.GetBase64Data(ctx, content.Media) if err != nil { return providers.PredictionResponse{}, err }
// Use data in API call // ... } } }
// Call LLM API // ...}Error Handling
Section titled “Error Handling”MediaLoader returns specific errors:
data, err := loader.GetBase64Data(ctx, media)if err != nil { switch { case errors.Is(err, providers.ErrNoMediaSource): // No source available (no Data, StorageReference, FilePath, or URL) case errors.Is(err, providers.ErrMediaNotFound): // Storage reference or file path not found case errors.Is(err, providers.ErrMediaTooLarge): // URL content exceeds MaxURLSizeBytes case errors.Is(err, context.DeadlineExceeded): // HTTP timeout or context cancelled default: // Other errors (permission, network, etc.) }}Performance Considerations
Section titled “Performance Considerations”Caching
Section titled “Caching”MediaLoader does not cache loaded media. For repeated access:
// Cache loaded media yourselfmediaCache := make(map[string]string)
data, ok := mediaCache[media.StorageReference.ID]if !ok { var err error data, err = loader.GetBase64Data(ctx, media) if err != nil { return err } mediaCache[media.StorageReference.ID] = data}Async Loading
Section titled “Async Loading”For loading multiple media items in parallel:
type loadResult struct { data string err error}
// Load media concurrentlyresults := make([]loadResult, len(mediaItems))var wg sync.WaitGroup
for i, media := range mediaItems { wg.Add(1) go func(idx int, m *types.MediaContent) { defer wg.Done() data, err := loader.GetBase64Data(ctx, m) results[idx] = loadResult{data, err} }(i, media)}
wg.Wait()
// Check resultsfor i, result := range results { if result.err != nil { log.Printf("Failed to load media %d: %v", i, result.err) }}Best Practices
Section titled “Best Practices”✅ Do:
- Create one MediaLoader per application (reuse)
- Set reasonable HTTP timeout (30s is good default)
- Set MaxURLSizeBytes to prevent abuse
- Handle errors gracefully (media may be unavailable)
- Use context for cancellation support
❌ Don’t:
- Don’t create MediaLoader per request (expensive)
- Don’t ignore errors (media may be corrupted/missing)
- Don’t set timeout too low (large images take time)
- Don’t allow unlimited URL sizes (DoS risk)
- Don’t cache without bounds (memory leak)
Integration with Media Storage
Section titled “Integration with Media Storage”MediaLoader and media storage work together:
// 1. Set up storagefileStore := local.NewFileStore(local.FileStoreConfig{ BaseDir: "./media",})
// 2. Create loader with storageloader := providers.NewMediaLoader(providers.MediaLoaderConfig{ StorageService: fileStore,})
// 3. Use in SDKmanager, _ := sdk.NewConversationManager( sdk.WithProvider(provider), sdk.WithMediaStorage(fileStore), // Externalizes media)
// 4. Provider automatically uses loader// Media externalized by MediaExternalizer middleware// Provider loads via MediaLoader when needed// Application code remains unchangedFlow:
1. User sends image → inline Data2. LLM returns generated image → inline Data3. MediaExternalizer → externalizes to storage, clears Data, adds StorageReference4. State saved → only reference in Redis/Postgres5. Next turn: Provider needs image → MediaLoader loads from StorageReference6. Transparent to application codeSee Also
Section titled “See Also”- Storage Reference - Media storage backends
- Types Reference - MediaContent structure
- How-To: Configure Media Storage - Setup guide
- Explanation: Media Storage - Design and architecture