Skip to content

Manage Context

Learn how to configure context management and truncation for long conversations.

When conversations grow long, they may exceed the LLM’s context window. The SDK provides context management options:

  • Token budget - Maximum tokens for context
  • Truncation strategy - How to reduce context when over budget
  • Relevance truncation - Use embeddings to keep relevant messages

Limit the total tokens used for context:

conv, _ := sdk.Open("./app.pack.json", "chat",
sdk.WithTokenBudget(8000),
)

When the conversation exceeds this budget, older messages are truncated.

Removes oldest messages first:

conv, _ := sdk.Open("./app.pack.json", "chat",
sdk.WithTokenBudget(8000),
sdk.WithTruncation("sliding"),
)

Keeps semantically relevant messages using embeddings:

import (
"github.com/AltairaLabs/PromptKit/runtime/providers/openai"
"github.com/AltairaLabs/PromptKit/sdk"
)
// Create embedding provider
embProvider, _ := openai.NewEmbeddingProvider()
conv, _ := sdk.Open("./app.pack.json", "chat",
sdk.WithTokenBudget(8000),
sdk.WithRelevanceTruncation(&sdk.RelevanceConfig{
EmbeddingProvider: embProvider,
MinRecentMessages: 3,
SimilarityThreshold: 0.3,
}),
)

For applications with persistent state (a state store), you can use a three-tier approach instead of truncation to manage long conversations efficiently:

  1. Hot window — load only the most recent N messages
  2. Semantic retrieval — find relevant older messages using embeddings
  3. Auto-summarization — compress old turns into summaries

These options use ContextAssemblyStage and IncrementalSaveStage internally, avoiding the need to load and save the full conversation history on every turn.

Load only the last N messages instead of the full history:

conv, _ := sdk.Open("./app.pack.json", "chat",
sdk.WithStateStore(store),
sdk.WithConversationID("session-123"),
sdk.WithContextWindow(20), // Keep last 20 messages
)

Requires WithStateStore and WithConversationID. When the store implements the optional MessageReader and MessageAppender interfaces (as RedisStore and MemoryStore do), only the tail of the conversation is loaded and new messages are appended incrementally. Falls back to full Load/Save otherwise.

On each turn, embed the user’s message and search older messages (outside the hot window) for semantic matches:

embProvider, _ := openai.NewEmbeddingProvider()
conv, _ := sdk.Open("./app.pack.json", "chat",
sdk.WithStateStore(store),
sdk.WithConversationID("session-123"),
sdk.WithContextWindow(20),
sdk.WithContextRetrieval(embProvider, 5), // Retrieve top 5 matches
)

Retrieved messages are inserted chronologically between summaries and the hot window. Requires WithContextWindow to be set.

When the message count exceeds a threshold, compress the oldest unsummarized batch into a summary:

conv, _ := sdk.Open("./app.pack.json", "chat",
sdk.WithStateStore(store),
sdk.WithConversationID("session-123"),
sdk.WithContextWindow(20),
sdk.WithAutoSummarize(summaryProvider, 50, 10), // threshold=50, batchSize=10
)

Summaries are prepended to the context as system messages. You can use a cheaper or faster model for the summary provider (e.g., gpt-4o-mini) to minimize cost.

Use all three tiers together for the most efficient long conversation handling:

import (
"github.com/AltairaLabs/PromptKit/runtime/providers/openai"
"github.com/AltairaLabs/PromptKit/runtime/statestore"
"github.com/AltairaLabs/PromptKit/sdk"
)
store := statestore.NewMemoryStore()
embProvider, _ := openai.NewEmbeddingProvider()
summaryProvider := // your LLM provider for summarization
conv, _ := sdk.Open("./app.pack.json", "chat",
sdk.WithStateStore(store),
sdk.WithConversationID("session-123"),
sdk.WithContextWindow(20),
sdk.WithContextRetrieval(embProvider, 5),
sdk.WithAutoSummarize(summaryProvider, 50, 10),
)

During multi-round tool execution, tool results accumulate and can exhaust the context window. The context compactor runs automatically between rounds and folds stale tool results into compact summaries (e.g., [file_read: package main... — 12000 bytes compacted]).

Compaction is on by default. It:

  • Triggers when estimated context exceeds 70% of the token budget
  • Folds oldest tool results first, preserving the most recent 4 messages
  • Never modifies error results, system messages, or non-tool messages
  • Emits a context.compacted event with token count details

The token budget is auto-detected from the provider if it implements ContextWindowProvider, otherwise defaults to 128K tokens.

To disable compaction:

conv, _ := sdk.Open("./app.pack.json", "chat",
sdk.WithCompaction(false),
)
import "github.com/AltairaLabs/PromptKit/runtime/providers/openai"
// Default model (text-embedding-3-small)
embProvider, _ := openai.NewEmbeddingProvider()
// Custom model
embProvider, _ := openai.NewEmbeddingProvider(
openai.WithEmbeddingModel("text-embedding-3-large"),
)
import "github.com/AltairaLabs/PromptKit/runtime/providers/gemini"
// Default model (text-embedding-004)
embProvider, _ := gemini.NewEmbeddingProvider()
// Custom model
embProvider, _ := gemini.NewEmbeddingProvider(
gemini.WithGeminiEmbeddingModel("embedding-001"),
)

Recommended by Anthropic for Claude-based systems:

import "github.com/AltairaLabs/PromptKit/runtime/providers/voyageai"
// Default model (voyage-3.5)
embProvider, _ := voyageai.NewEmbeddingProvider()
// Code-optimized model
embProvider, _ := voyageai.NewEmbeddingProvider(
voyageai.WithModel("voyage-code-3"),
)
// With input type hint for retrieval
embProvider, _ := voyageai.NewEmbeddingProvider(
voyageai.WithInputType(voyageai.InputTypeQuery),
)
&sdk.RelevanceConfig{
// Required: embedding provider
EmbeddingProvider: embProvider,
// Always keep N most recent messages (default: 3)
MinRecentMessages: 3,
// Never truncate system messages (default: true)
AlwaysKeepSystemRole: true,
// Minimum similarity score to keep (default: 0.0)
SimilarityThreshold: 0.3,
// What to compare messages against (default: "last_user")
QuerySource: "last_user", // or "last_n", "custom"
// For QuerySource "last_n": how many messages
LastNCount: 3,
// For QuerySource "custom": the query text
CustomQuery: "customer billing issue",
}

Control what the relevance is computed against:

Compare against the most recent user message:

&sdk.RelevanceConfig{
EmbeddingProvider: embProvider,
QuerySource: "last_user",
}

Compare against multiple recent messages:

&sdk.RelevanceConfig{
EmbeddingProvider: embProvider,
QuerySource: "last_n",
LastNCount: 5,
}

Compare against a specific query:

&sdk.RelevanceConfig{
EmbeddingProvider: embProvider,
QuerySource: "custom",
CustomQuery: "technical support for billing",
}
package main
import (
"context"
"fmt"
"log"
"github.com/AltairaLabs/PromptKit/runtime/providers/openai"
"github.com/AltairaLabs/PromptKit/sdk"
)
func main() {
ctx := context.Background()
// Create embedding provider for relevance truncation
embProvider, err := openai.NewEmbeddingProvider()
if err != nil {
log.Fatal(err)
}
// Open conversation with context management
conv, err := sdk.Open("./app.pack.json", "chat",
sdk.WithTokenBudget(8000),
sdk.WithRelevanceTruncation(&sdk.RelevanceConfig{
EmbeddingProvider: embProvider,
MinRecentMessages: 3,
SimilarityThreshold: 0.3,
AlwaysKeepSystemRole: true,
}),
)
if err != nil {
log.Fatal(err)
}
defer conv.Close()
// Simulate a long conversation
messages := []string{
"I'm having trouble with my account billing",
"The charge appeared on December 15th",
"What's the weather like today?", // Unrelated - may be truncated
"Can you look up my order history?",
"Back to my billing - can you help fix it?",
}
for _, msg := range messages {
resp, err := conv.Send(ctx, msg)
if err != nil {
log.Fatal(err)
}
fmt.Printf("User: %s\nAssistant: %s\n\n", msg, resp.Text())
}
}
  1. Cache embeddings: Enabled by default in RelevanceConfig
  2. Use smaller models: text-embedding-3-small is faster than large
  3. Set appropriate threshold: Higher threshold = fewer messages = faster
  4. Batch requests: The SDK batches embedding requests automatically

Set API keys for embedding providers:

Terminal window
# OpenAI
export OPENAI_API_KEY=sk-...
# Gemini
export GEMINI_API_KEY=...
# Voyage AI
export VOYAGE_API_KEY=...