Skip to content

Glossary

Definitions of key terms used throughout PromptKit documentation.

Audio Streaming Model - A mode for native bidirectional audio streaming with multimodal LLMs like Gemini Live. Audio streams directly to and from the model without separate STT/TTS stages. Compare with VAD.

Automatic Speech Recognition - Technology that converts spoken audio into text. Also known as STT (Speech-to-Text). Used in VAD mode pipelines.

The ability for a user to interrupt the AI assistant while it’s speaking. Enables natural back-and-forth conversations in voice interactions. Requires real-time turn detection.

The number of bits used to represent each audio sample. PromptKit typically uses 16-bit audio, which provides good quality for voice while keeping file sizes manageable.

The number of audio tracks in a stream. Mono (1 channel) is used for voice conversations; stereo (2 channels) is unnecessary for speech.

Bidirectional streaming - A communication pattern where audio or data flows in both directions simultaneously. In PromptKit, duplex sessions enable real-time voice conversations where the user can speak while receiving responses.

Pulse Code Modulation - An uncompressed audio format used for streaming and storage. PromptKit uses raw PCM (no headers) at 16kHz sample rate for voice input to Gemini Live API.

The number of audio samples captured per second, measured in Hz. PromptKit uses 16000 Hz (16 kHz) for voice input, which is standard for speech recognition.

Speech-to-Text - The process of converting spoken audio into written text. Also known as ASR. In VAD mode, STT converts user speech before sending to the LLM.

Text-to-Speech - The process of converting written text into spoken audio. In VAD mode, TTS converts LLM responses into audio for playback.

A single exchange in a conversation consisting of user input followed by an AI response. See also Multi-turn.

The process of determining when a user has finished speaking. Uses VAD to identify speech boundaries based on silence duration and speech patterns. Also called “endpointing.”

Voice Activity Detection - A mode that detects when a user is speaking versus silent. Used to determine turn boundaries in voice conversations. VAD mode pipelines use separate STT and TTS stages. Compare with ASM.

PromptArena - PromptKit’s testing framework for running scenarios against prompts. Executes conversations, validates responses, and generates reports.

A test check in Arena that validates expected behavior in a scenario. Assertions run after each turn to verify response content, format, or other criteria. Different from guardrails, which are runtime checks.

Directed Acyclic Graph - The structure used by PromptKit’s pipeline where data flows through stages in a defined order with no cycles, ensuring deterministic execution.

A publish-subscribe system that distributes execution events to listeners. Used for observability, logging, and monitoring pipeline execution.

A validator that enforces policies on LLM outputs in real-time. Guardrails can detect and prevent policy violations (banned content, PII exposure, etc.) and abort responses early. Different from assertions, which are test-time checks.

Pluggable processing components that intercept and transform data flowing through the pipeline. Examples: template rendering, validation, state management.

A fake LLM provider that returns pre-configured responses. Used for deterministic testing without calling real APIs or incurring costs.

A JSON file (.pack.json) that bundles prompts, templates, and configuration together. Packs are the primary way to organize and distribute prompts in PromptKit.

Pack Compiler - PromptKit’s tool for validating, compiling, and managing prompt packs.

A configuration that defines how a self-play LLM should behave when simulating a user in Arena tests. Personas have their own system prompts and behavioral guidelines.

A sequence of processing stages that handle a conversation turn. Pipelines can include stages for audio processing, LLM interaction, validation, and more. Implemented as a DAG.

An abstraction layer for LLM services (OpenAI, Anthropic, Google, etc.). Providers handle API communication and normalize responses across different services.

Re-running a recorded conversation using captured provider responses instead of calling real LLMs. Enables deterministic debugging and regression testing.

PromptKit’s core execution engine that loads packs, manages state, executes pipelines, and coordinates providers.

A test definition for Arena that specifies conversations to run, expected behaviors, and assertions to validate.

PromptKit’s high-level Go library for building conversational applications. Provides a simplified API over the Runtime.

An Arena testing mode where an LLM generates simulated user responses based on a persona. Combined with TTS, enables fully automated voice testing without pre-recorded audio.

Capturing detailed event streams and artifacts (audio, messages, context) from test runs. Used for debugging, replay, and analysis.

A single processing step within a pipeline. Examples: LLMStage (calls the model), TTSStage (converts text to speech), VADStage (detects speech boundaries).

Persistent storage backend (Redis, in-memory, file) that maintains conversation history and context across sessions.

Returning LLM output in real-time as tokens are generated, rather than waiting for the complete response. Provides faster perceived latency and enables barge-in.

A pluggable component that dynamically provides values for template variables at runtime. Examples: TimeProvider (current time), RAGProvider (retrieved context), StateProvider (conversation state).

The maximum amount of text (measured in tokens) that an LLM can process in a single request, including both input and output.

Dense vector representations of text that capture semantic meaning. Used for similarity scoring, semantic search, and RAG retrieval.

Providing an LLM with example input/output pairs in the prompt to guide its behavior, without requiring fine-tuning.

Anchoring LLM responses to factual, verifiable information from external sources. Reduces hallucinations by providing relevant context.

When an LLM generates plausible-sounding but false or fabricated information not supported by its training data or provided context.

A conversation with multiple back-and-forth exchanges that maintains context across turns. Requires state management to track history.

An open-source platform for running LLMs locally. PromptKit’s Ollama provider enables cost-free local inference using models like Llama, Mistral, and LLaVA. Uses an OpenAI-compatible API.

A high-throughput inference engine optimized for serving LLMs with GPU acceleration. PromptKit’s vLLM provider enables high-performance model serving with advanced features like guided decoding, beam search, and multimodal support. Like Ollama, it’s self-hosted for zero API costs, but optimized for throughput and performance with GPU support.

Support for multiple content types (text, images, audio, video) in a single interaction. Gemini and GPT-4V are examples of multimodal models.

Instructions and context provided to an LLM to guide its response. In PromptKit, prompts are defined in packs with templates for dynamic content.

The practice of crafting effective prompts to guide LLM behavior toward desired outputs. Includes techniques like few-shot learning and chain-of-thought.

Retrieval-Augmented Generation - A technique that retrieves relevant context from external data sources (documents, databases) and provides it to the LLM. Improves grounding and reduces hallucinations.

Initial instructions that set the LLM’s behavior, personality, and constraints. Defined in the system_template field of a prompt.

A parameter controlling LLM output randomness. Lower values (0.0-0.3) produce more deterministic responses; higher values (0.7-1.0) produce more creative/varied responses.

The basic unit of text processing for LLMs. Roughly 4 characters or 0.75 words in English. Used to measure context window size and API costs.

A function that an LLM can call to perform actions or retrieve information. Also known as function calling. Defined in prompt packs and executed by the runtime.

Human-in-the-Loop - A workflow pattern where certain decisions or tool executions require human approval before proceeding. Used for sensitive operations or quality control.

Model Context Protocol - An open standard for connecting LLMs to external tools and data sources. PromptKit supports MCP for tool integration.

A protocol providing full-duplex communication over a single TCP connection. Used by PromptKit for real-time duplex streaming with Gemini Live API.

  • Concepts - Detailed explanations of PromptKit concepts
  • Architecture - System design and component relationships