Skip to content

Realtime Video Streaming Example

This example demonstrates how to stream video frames to an LLM for real-time vision analysis using PromptKit’s duplex session capabilities.

  • OpenDuplex with video: Opening a bidirectional streaming session for video
  • WithStreamingVideo: Configuring frame rate limiting and preprocessing
  • SendFrame(): Sending individual image frames to the session
  • Frame rate limiting: Automatic dropping of excess frames to match LLM processing speed
  • Webcam analysis: Real-time description of what a webcam sees
  • Screen sharing: Analyzing screen content as it changes
  • Security monitoring: Continuous analysis of camera feeds
  • Accessibility: Describing visual content for users with visual impairments
Terminal window
export GEMINI_API_KEY=your-key
go run .

1. Open a Duplex Session with Video Config

Section titled “1. Open a Duplex Session with Video Config”
conv, err := sdk.OpenDuplex(
"./pack.json",
"vision-stream",
sdk.WithStreamingVideo(&sdk.VideoStreamConfig{
TargetFPS: 1.0, // Process 1 frame per second
MaxWidth: 1024, // Resize large frames
MaxHeight: 1024,
Quality: 85, // JPEG quality
EnableResize: true,
}),
)
frame := &session.ImageFrame{
Data: jpegBytes,
MIMEType: "image/jpeg",
Width: 640,
Height: 480,
FrameNum: frameCount,
Timestamp: time.Now(),
}
err = conv.SendFrame(ctx, frame)
for chunk := range conv.Response() {
if chunk.Content != "" {
fmt.Print(chunk.Content)
}
}

The TargetFPS setting automatically drops excess frames:

  • Webcam at 30 FPS → With TargetFPS: 1.0, only ~1 frame/second reaches the LLM
  • No frame loss: The most recent frame is kept, older frames are dropped
  • Reduces costs: Fewer frames = fewer tokens = lower API costs

For encoded video segments (H.264, VP8, etc.), use SendVideoChunk():

chunk := &session.VideoChunk{
Data: h264Data,
MIMEType: "video/h264",
ChunkIndex: 0,
IsKeyFrame: true,
Timestamp: time.Now(),
}
err = conv.SendVideoChunk(ctx, chunk)

In a real application, replace the simulated frames with actual capture:

import "gocv.io/x/gocv"
webcam, _ := gocv.VideoCaptureDevice(0)
defer webcam.Close()
mat := gocv.NewMat()
defer mat.Close()
for webcam.Read(&mat) {
// Encode to JPEG
buf, _ := gocv.IMEncode(".jpg", mat)
frame := &session.ImageFrame{
Data: buf.GetBytes(),
MIMEType: "image/jpeg",
Timestamp: time.Now(),
}
conv.SendFrame(ctx, frame)
}

Currently, realtime video streaming works best with providers that support bidirectional streaming:

  • Gemini Live API: Full support for real-time vision (when available)
  • OpenAI Realtime: Audio-focused, video support may be added later

The SDK is provider-agnostic - video frames flow through when the provider supports them.