Session Recording Architecture

This document explains how PromptKit’s session recording system captures, stores, and replays LLM conversations with full fidelity.

Overview

Session recording provides a complete audit trail of LLM interactions. Unlike simple logging, recordings capture:

Precise timing: Millisecond-accurate event timestamps
Complete data: All messages, tool calls, and media
Reconstructable state: Enough information to replay conversations exactly

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Live Session  │ ──► │   Event Store   │ ──► │   Recording     │
│   (Emitter)     │     │   (JSONL)       │     │   (Replay)      │
└─────────────────┘     └─────────────────┘     └─────────────────┘
        │                       │                       │
        ▼                       ▼                       ▼
   Real-time              Persistent              Synchronized
   events                 storage                 playback

Event-Driven Architecture

Event Emitter

The Emitter is the heart of session recording. It captures events as they occur:

emitter := events.NewEmitter(eventBus, runID, scenarioID, conversationID)

// Events are emitted automatically during conversation
emitter.ConversationStarted(systemPrompt)
emitter.MessageCreated(role, content)
emitter.AudioInput(audioData)
emitter.ProviderCallStarted(provider, model)
emitter.ToolCallStarted(toolName, args)
// ...

Events flow through an EventBus to registered subscribers:

Emitter ──► EventBus ──┬──► FileEventStore (persistence)
                       ├──► Metrics Collector
                       └──► Real-time UI

Event Types

Events are categorized by their domain:

Category	Events	Purpose
Conversation	`started`, `ended`	Session lifecycle
Message	`created`, `updated`	Content exchange
Audio	`input`, `output`	Voice data
Provider	`call.started`, `call.completed`	LLM API calls
Tool	`call.started`, `call.completed`	Function execution
Validation	`started`, `completed`	Guardrail checks

Event Structure

Each event contains:

type Event struct {
    Type      EventType       // Event category
    Timestamp time.Time       // When it occurred
    SessionID string          // Session identifier
    Sequence  int64           // Ordering guarantee
    Data      interface{}     // Event-specific payload
}

Storage Format

FileEventStore

Events are persisted in JSONL (JSON Lines) format:

{"seq":1,"event":{"type":"conversation.started","timestamp":"2024-01-15T10:30:00Z",...}}
{"seq":2,"event":{"type":"message.created","timestamp":"2024-01-15T10:30:00.1Z",...}}
{"seq":3,"event":{"type":"audio.input","timestamp":"2024-01-15T10:30:00.2Z",...}}

Benefits:

Append-only: Safe for concurrent writes
Streamable: Process without loading entire file
Human-readable: Easy debugging

SessionRecording Format

For export/import, recordings use a structured format:

{"type":"metadata","session_id":"...","start_time":"...","duration":"..."}
{"type":"event","offset":"100ms","event_type":"message.created",...}
{"type":"event","offset":"200ms","event_type":"audio.input",...}

The loader auto-detects both formats:

rec, err := recording.Load("session.jsonl")  // Works with either format

Media Timeline

For recordings with audio/video, the MediaTimeline provides synchronized access:

Recording
    │
    ▼
MediaTimeline
    ├── TrackAudioInput ──► User speech segments
    ├── TrackAudioOutput ──► Assistant speech segments
    └── TrackVideo ──► Video frames (if present)

Track Structure

Each track contains time-ordered segments:

type MediaSegment struct {
    Offset    time.Duration  // Start time relative to session
    Duration  time.Duration  // Segment length
    Data      []byte         // Raw media data
    Format    AudioFormat    // Sample rate, channels, encoding
}

Audio Reconstruction

Audio can be exported as standard WAV files:

timeline := rec.ToMediaTimeline(nil)
timeline.ExportAudioToWAV(events.TrackAudioInput, "user.wav")
timeline.ExportAudioToWAV(events.TrackAudioOutput, "assistant.wav")

The export process:

Collects all segments for the track
Concatenates PCM data in time order
Writes RIFF/WAVE header with format info
Outputs standard 16-bit PCM WAV

Replay System

ReplayPlayer

The ReplayPlayer provides synchronized access to recordings:

player, _ := recording.NewReplayPlayer(rec)

// Seek to any position
player.Seek(5 * time.Second)

// Query state at current position
state := player.GetState()
// state.CurrentEvents - events at this moment
// state.RecentEvents - events in last 2 seconds
// state.Messages - all messages up to this point
// state.AudioInputActive - is user speaking?
// state.AudioOutputActive - is assistant speaking?

Playback State

At any position, you can access:

Field	Description
`Position`	Current offset from session start
`Timestamp`	Absolute timestamp
`CurrentEvents`	Events within 50ms of position
`RecentEvents`	Events in last 2 seconds
`Messages`	Accumulated conversation
`AudioInputActive`	User audio present
`AudioOutputActive`	Assistant audio present
`ActiveAnnotations`	Annotations in scope

Annotation Correlation

Annotations can be attached to recordings for review:

player.SetAnnotations([]*annotations.Annotation{
    // Session-level annotation
    annotations.ForSession().WithScore("quality", 0.92),

    // Time-range annotation
    annotations.InTimeRange(start, end).WithComment("Good response"),

    // Event-targeted annotation
    annotations.ForEvent(eventSeq).WithLabel("category", "greeting"),
})

During playback, active annotations are included in state queries.

Replay Provider

For deterministic replay, the ReplayProvider simulates the original provider:

provider, _ := replay.NewProviderFromRecording(rec)

// Returns the same responses as the original session
response, _ := provider.Complete(ctx, messages, opts)

Use cases:

Regression testing: Verify behavior against known-good responses
Debugging: Reproduce exact conversation flows
Offline testing: No API calls needed

Data Flow Summary

┌─────────────────────────────────────────────────────────────────┐
│                        LIVE SESSION                              │
│                                                                   │
│  User ──► Pipeline ──► Provider ──► Response ──► User            │
│              │                          │                         │
│              ▼                          ▼                         │
│          Emitter ────────────────► EventBus                       │
│                                        │                          │
└────────────────────────────────────────┼──────────────────────────┘
                                         │
                                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                        STORAGE                                    │
│                                                                   │
│  EventBus ──► FileEventStore ──► session.jsonl                   │
│                                                                   │
└────────────────────────────────────────┬──────────────────────────┘
                                         │
                                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                        REPLAY                                     │
│                                                                   │
│  session.jsonl ──► SessionRecording ──► ReplayPlayer             │
│                           │                    │                  │
│                           ▼                    ▼                  │
│                    MediaTimeline         Synchronized            │
│                    (WAV export)          Playback                │
│                                                                   │
└─────────────────────────────────────────────────────────────────┘

Performance Considerations

Recording Overhead

CPU: Minimal - events are queued and written asynchronously
Memory: Bounded - segments are written incrementally
Disk: Proportional to conversation length and audio duration

Audio Data Size

Audio is stored as base64-encoded PCM:

Input: 16kHz, 16-bit mono = ~32KB/second
Output: 24kHz, 16-bit mono = ~48KB/second
Base64 encoding adds ~33% overhead

A 5-minute voice conversation generates approximately:

Raw audio: ~24MB
JSONL with metadata: ~32MB

Optimization Tips

Disable for CI: Skip recording for quick validation runs
Compress archives: JSONL compresses well (70-80% reduction)
Retention policies: Auto-delete old recordings
Selective recording: Only enable for specific scenarios

Next Steps

Session Recording How-To - Practical usage guide
Duplex Architecture - Voice conversation system
Testing Philosophy - Why test prompts?