Runtime Explanation

Understanding the architecture and design of PromptKit Runtime.

Purpose

These explanations help you understand why Runtime works the way it does. They cover architectural decisions, design patterns, and the reasoning behind Runtime’s implementation.

Topics

Architecture

Pipeline Architecture - How the stage-based streaming pipeline works
Provider System - LLM provider abstraction and implementation

Integration

Stage Design - Composable stage patterns
State Management - Conversation history and persistence

When to Read These

Read explanations when you want to:

Understand architectural decisions
Learn design patterns used in Runtime
Extend Runtime with custom components
Troubleshoot complex issues
Contribute to Runtime development

Don’t read these when you need to:

Quickly complete a specific task → See How-To Guides
Learn Runtime from scratch → See Tutorials
Look up API details → See Reference

Key Concepts

Stage-Based Architecture

Runtime uses a stage-based streaming architecture where each stage processes elements in its own goroutine. This provides:

True Streaming: Elements flow through as they’re produced
Concurrency: Stages run in parallel
Composability: Mix and match stages
Backpressure: Channel-based flow control
Testability: Test stages in isolation

Provider Abstraction

Runtime abstracts LLM providers behind a common interface, enabling:

Provider independence: Switch providers without code changes
Fallback strategies: Try multiple providers automatically
Consistent API: Same code for OpenAI, Claude, Gemini
Easy testing: Mock providers for tests

Tool System

Runtime implements function calling through:

Tool descriptors: Define tools with schemas
Executors: Pluggable tool execution backends
MCP integration: Connect external tool servers
Automatic calling: LLM decides when to use tools

Architecture Overview

Each stage runs concurrently in its own goroutine, connected by channels.

Pipeline Modes

Runtime supports three execution modes:

Text Mode

Standard HTTP-based LLM interactions for chat and completion.

VAD Mode

Voice Activity Detection for voice applications using text-based LLMs: Audio → STT → LLM → TTS

ASM Mode

Audio Streaming Mode for native multimodal LLMs with real-time audio via WebSocket.

Design Principles

1. Streaming First

Channel-based data flow
Concurrent stage execution
True streaming (not simulated)

2. Composability

Stage pipeline is flexible
Mix components as needed
Build custom pipelines

3. Extensibility

Custom stages supported
Custom providers possible
Custom tools and executors

4. Production-Ready

Error handling built-in
Resource cleanup automatic
Monitoring capabilities

Reference: Complete API documentation
How-To Guides: Task-focused guides
Tutorials: Step-by-step learning

Contributing

Understanding these architectural concepts is valuable when contributing to Runtime. See the Runtime codebase for implementation details.