Manage Context
Learn how to configure context management and truncation strategies for long conversations.
Overview
Section titled “Overview”When conversations grow long, they may exceed the LLM’s context window or token budget. PromptKit provides context management strategies to handle this automatically:
- Truncate oldest messages - Remove earliest messages first (simple, fast)
- Truncate by relevance - Use embeddings to keep semantically relevant messages (smarter)
Basic Configuration
Section titled “Basic Configuration”Add context_policy to your scenario:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: long-conversation-test
spec: task_type: support
context_policy: token_budget: 8000 strategy: truncate_oldest
turns: - role: user content: "Start of conversation..." # ... many turns ...Truncation Strategies
Section titled “Truncation Strategies”Truncate Oldest (Default)
Section titled “Truncate Oldest (Default)”Removes the oldest messages when approaching the token budget:
context_policy: token_budget: 8000 strategy: truncate_oldestPros: Fast, predictable Cons: May remove contextually important early messages
Relevance-Based Truncation
Section titled “Relevance-Based Truncation”Uses embedding similarity to keep the most relevant messages:
context_policy: token_budget: 8000 strategy: relevance relevance: provider: openai model: text-embedding-3-small min_recent_messages: 3 similarity_threshold: 0.3Pros: Preserves semantically important context Cons: Requires embedding API calls (additional latency/cost)
Relevance Configuration
Section titled “Relevance Configuration”Provider Options
Section titled “Provider Options”Choose an embedding provider:
# OpenAI (recommended for general use)relevance: provider: openai model: text-embedding-3-small # or text-embedding-3-large
# Geminirelevance: provider: gemini model: text-embedding-004
# Voyage AI (recommended for retrieval tasks)relevance: provider: voyageai model: voyage-3.5 # or voyage-code-3 for codeParameters
Section titled “Parameters”| Parameter | Type | Default | Description |
|---|---|---|---|
provider | string | required | Embedding provider: openai, gemini, voyageai |
model | string | provider default | Embedding model to use |
min_recent_messages | int | 3 | Always keep N most recent messages |
similarity_threshold | float | 0.3 | Minimum similarity score (0-1) to keep message |
always_keep_system | bool | true | Never truncate system messages |
cache_embeddings | bool | true | Cache embeddings for performance |
Example: Code Assistant
Section titled “Example: Code Assistant”For code-related conversations, use a code-optimized model:
context_policy: token_budget: 16000 strategy: relevance relevance: provider: voyageai model: voyage-code-3 min_recent_messages: 5 similarity_threshold: 0.25Example: Customer Support
Section titled “Example: Customer Support”For support conversations, preserve context about the customer’s issue:
context_policy: token_budget: 8000 strategy: relevance relevance: provider: openai model: text-embedding-3-small min_recent_messages: 3 similarity_threshold: 0.35 always_keep_system: trueHow Relevance Truncation Works
Section titled “How Relevance Truncation Works”- Compute query embedding - Embeds the most recent user message(s)
- Score all messages - Computes cosine similarity between query and each message
- Apply rules:
- Always keep system messages (if
always_keep_system: true) - Always keep last N messages (per
min_recent_messages) - Keep messages with similarity >= threshold
- Always keep system messages (if
- Truncate remaining - Remove lowest-scoring messages until under budget
Environment Variables
Section titled “Environment Variables”Set API keys for embedding providers:
# OpenAIexport OPENAI_API_KEY=sk-...
# Geminiexport GEMINI_API_KEY=...
# Voyage AIexport VOYAGE_API_KEY=...Complete Example
Section titled “Complete Example”apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: extended-support-conversation labels: category: support context: managed
spec: task_type: support description: "Tests context management in long conversations"
context_policy: token_budget: 8000 strategy: relevance relevance: provider: openai model: text-embedding-3-small min_recent_messages: 3 similarity_threshold: 0.3
turns: - role: user content: "I'm having trouble with my account billing"
- role: user content: "The charge appeared on December 15th"
- role: user content: "What's the weather like today?" # This unrelated message may be truncated
- role: user content: "Back to my billing issue - can you help?" assertions: - type: content_includes params: patterns: ["billing", "charge", "December"] message: "Should remember billing context"Performance Considerations
Section titled “Performance Considerations”- Caching: Enable
cache_embeddings: trueto avoid re-computing embeddings - Model size: Smaller models (e.g.,
text-embedding-3-small) are faster - Batch size: Embeddings are computed in batches for efficiency
- Token budget: Set appropriately for your LLM’s context window
Troubleshooting
Section titled “Troubleshooting”Missing API Key
Section titled “Missing API Key”Error: openai API key not found: set OPENAI_API_KEY environment variableSet the required environment variable for your embedding provider.
High Latency
Section titled “High Latency”If truncation is slow:
- Enable embedding caching
- Use a smaller embedding model
- Increase
similarity_thresholdto keep fewer messages
Context Lost
Section titled “Context Lost”If important context is being truncated:
- Lower
similarity_threshold - Increase
min_recent_messages - Ensure
always_keep_system: true
See Also
Section titled “See Also”- Write Scenarios - Scenario configuration basics
- Configure Providers - Provider setup
- SDK Context Management - Programmatic context control