Multimodal Basics Example
Multimodal Basics Example
Section titled “Multimodal Basics Example”This example demonstrates how to configure and use multimodal media support in PromptKit, with runnable PromptArena tests using both real models and mock providers.
Quick Start
Section titled “Quick Start”1. Setup
Section titled “1. Setup”# Copy environment templatecp .env.example .env
# Add your Gemini API key to .env (for real model testing)echo "GEMINI_API_KEY=your_actual_key" > .env2. Run Tests
Section titled “2. Run Tests”# Run with mock provider (no API keys needed - perfect for CI/CD)promptarena run arena.yaml --provider mock-vision
# Run with real Gemini vision model (requires API key)promptarena run arena.yaml --provider gemini-vision
# Run all providers and scenariospromptarena run arena.yaml3. View Results
Section titled “3. View Results”# View HTML reportopen out/multimodal-report.html
# Or check JSON resultscat out/results.json | jqStructure
Section titled “Structure”multimodal-basics/├── arena.yaml # Arena configuration├── prompts/│ ├── image-analyzer.yaml # Multimodal prompt with MediaConfig│ ├── audio-transcriber.yaml # Audio analysis example│ └── mixed-media-assistant.yaml # Combined media types├── providers/│ ├── gemini-vision.yaml # Real Gemini model configuration│ └── mock-vision.yaml # Mock provider for testing├── scenarios/│ ├── image-analysis.yaml # Single image test│ └── image-comparison.yaml # Multiple images test├── mock-responses.yaml # Deterministic mock responses└── .env.example # Environment templateFeatures
Section titled “Features”Runnable Scenarios
Section titled “Runnable Scenarios”✅ Image Analysis (scenarios/image-analysis.yaml)
- Tests single image description capabilities
- Validates response includes image-related content
- Checks length constraints (50-2000 characters)
- Tests color and composition analysis
✅ Image Comparison (scenarios/image-comparison.yaml)
- Tests comparing multiple images
- Validates comparative analysis
- Checks professional assessment capabilities
- Tests detailed technical descriptions
Provider Support
Section titled “Provider Support”✅ Mock Provider (providers/mock-vision.yaml)
- Deterministic testing without API calls
- Scenario-specific responses in
mock-responses.yaml - Turn-by-turn scripted responses
- Perfect for CI/CD testing and development
✅ Gemini Vision (providers/gemini-vision.yaml)
- Real multimodal model testing
- Requires
GEMINI_API_KEYin.env - Tests actual vision capabilities
- Validates MediaConfig constraints work with real models
MediaConfig Structure
Section titled “MediaConfig Structure”Each prompt includes a media: section that defines capabilities:
media: enabled: true supported_types: - image
image: max_size_mb: 20 allowed_formats: [jpeg, png, webp] default_detail: high max_images_per_msg: 5
examples: - name: "single-image-analysis" role: user parts: - type: text text: "What's in this image?" - type: image media: file_path: "./test-images/sample.jpg" mime_type: "image/jpeg"Running Specific Tests
Section titled “Running Specific Tests”# Test only image analysis scenariopromptarena run arena.yaml --scenario image-analysis
# Test only mock providerpromptarena run arena.yaml --provider mock-vision
# Test with specific prompt configpromptarena run arena.yaml --prompt image-analyzer
# Increase concurrency for faster testingpromptarena run arena.yaml --concurrency 5
# Generate only HTML outputpromptarena run arena.yaml --output-format htmlMock Responses
Section titled “Mock Responses”The mock-responses.yaml file provides deterministic responses for testing:
scenarios: image-analysis: turns: 1: "I can see a landscape with mountains..." 2: "The color palette is predominantly warm..."
image-comparison: turns: 1: "Comparing these two images, I can identify..." 2: "Image 2 appears more professional because..."Benefits of mock responses:
- ✅ Predictable test results
- ✅ No API costs during development
- ✅ Fast iteration on prompt logic
- ✅ CI/CD testing without secrets
- ✅ Scenario-specific responses
Architecture Notes
Section titled “Architecture Notes”Media Bundling: NOT Implemented
Section titled “Media Bundling: NOT Implemented”MediaConfig defines capabilities only (types, sizes, formats supported).
Actual media files are:
- ✅ Provided by users as input at runtime
- ✅ Returned by models as output
- ❌ NOT bundled into prompt packs
Benefits:
- Lightweight packs (no embedded media)
- Clear separation of prompt logic and user data
- Dynamic media handling at runtime
- No storage/versioning concerns for media
Provider Implementation
Section titled “Provider Implementation”Multimodal support requires provider-level implementation:
- Gemini 2.0: Images, audio, video ✅
- GPT-4V: Images ✅
- Claude 3: Images ✅
MediaConfig advertises what the prompt supports; actual handling depends on provider capabilities.
Validation
Section titled “Validation”MediaConfig is validated at compile time by PackC:
# Compile and validatepackc compile prompts/image-analyzer.yaml -o image-analyzer.pack.json
# Check media configuration in compiled packjq '.prompts[].media' image-analyzer.pack.jsonValidation checks:
- ✅ Supported types are valid (
image,audio,video) - ✅ Size limits are positive numbers
- ✅ Formats are from allowed lists
- ✅ Examples have proper structure (role, parts, media)
- ⚠️ Warns about missing example files (non-blocking)
Testing
Section titled “Testing”Unit Tests
Section titled “Unit Tests”# Test media validation logiccd ../../runtime/promptgo test -v -run TestValidateMediaConfig
# Test pack loading with MediaConfiggo test -v -run TestLoadPackWithMediaConfigTest coverage:
- 53 comprehensive test cases
- 77.8-100% coverage across media validation functions
- Tests for all media types and configurations
- Edge cases and error conditions
Integration Tests (Arena)
Section titled “Integration Tests (Arena)”# Run all scenarios with both providerspromptarena run arena.yaml
# Run with verbose outputpromptarena run arena.yaml -v
# Generate detailed HTML reportpromptarena run arena.yaml --output-dir ./test-resultsExpected Output
Section titled “Expected Output”Successful Mock Run
Section titled “Successful Mock Run”Running Arena: multimodal-basicsLoaded 1 prompt configsLoaded 2 providersLoaded 2 scenarios
Running scenario: image-analysis (mock-vision) Turn 1: ✓ Passed (content_includes: "image", "see") Turn 1: ✓ Passed (min_length: 50 chars) Turn 1: ✓ Passed (max_length: 2000 chars) Turn 2: ✓ Passed (content_includes: "color") Turn 2: ✓ Passed (min_length: 30 chars)
Running scenario: image-comparison (mock-vision) Turn 1: ✓ Passed (content_includes: "image", "difference") Turn 1: ✓ Passed (min_length: 100 chars) Turn 2: ✓ Passed (content_includes: "professional")
Summary: Total Tests: 8 Passed: 8 Failed: 0 Success Rate: 100%
Report saved to: out/multimodal-report.htmlTroubleshooting
Section titled “Troubleshooting”Missing API Key
Section titled “Missing API Key”Error: GEMINI_API_KEY not setSolution: cp .env.example .env && edit .env with your keyProvider Not Found
Section titled “Provider Not Found”Error: Provider 'gemini-vision' not foundSolution: Check that providers/gemini-vision.yaml exists and arena.yaml references it correctlyAssertion Failures
Section titled “Assertion Failures”Check the HTML report for detailed failure information:
open out/multimodal-report.html# Look for red X marks and click for detailsMock Responses Not Working
Section titled “Mock Responses Not Working”Error: Mock provider returning default responseSolution: Check that scenario name in arena.yaml matches key in mock-responses.yamlNext Steps
Section titled “Next Steps”1. Add Real Images
Section titled “1. Add Real Images”Create test images for actual multimodal testing:
mkdir test-images# Add sample images: sample-photo.jpg, before.jpg, after.jpg2. Extend Scenarios
Section titled “2. Extend Scenarios”Add more complex test cases:
- OCR text extraction from images
- Object detection validation
- Style transfer comparison
- Multi-step image reasoning
3. Add Audio/Video Support
Section titled “3. Add Audio/Video Support”Test other media types:
# Copy and modify audio-transcriber.yaml# Add audio scenarios# Configure mock responses for audio tests4. CI/CD Integration
Section titled “4. CI/CD Integration”Use mock provider in automated tests:
# .github/workflows/test.yml- name: Test Multimodal run: | cd examples/multimodal-basics promptarena run arena.yaml --provider mock-visionRelated Documentation
Section titled “Related Documentation”Related Examples
Section titled “Related Examples”- assertions-test/ - Testing with assertions and mock providers
- customer-support-integrated/ - Complex real-world scenario
- guardrails-test/ - Content validation examples