This example demonstrates Voice Activity Detection using PromptKit’s audio package.

Features

Running

cd sdk/examples/vad-demo
go run .

This example runs with simulated audio data - no microphone required.

VAD States

StateDescription
quietNo voice activity detected
startingVoice beginning (within start threshold)
speakingActive speech detected
stoppingVoice ending (within stop threshold)

Configuration

Default Parameters

params := audio.DefaultVADParams()
// Confidence: 0.5
// StartSecs: 0.2
// StopSecs: 0.8
// MinVolume: 0.01
// SampleRate: 16000

Strict VAD (noisy environments)

params := audio.VADParams{
    Confidence: 0.7,   // Higher confidence required
    StartSecs:  0.3,   // Longer speech to trigger
    StopSecs:   1.2,   // Allow longer pauses
    MinVolume:  0.02,  // Higher volume threshold
    SampleRate: 16000,
}

Sensitive VAD (quiet environments)

params := audio.VADParams{
    Confidence: 0.3,   // More sensitive
    StartSecs:  0.1,   // Quick start detection
    StopSecs:   0.5,   // Quick end detection
    MinVolume:  0.005, // Detect quiet speech
    SampleRate: 16000,
}

State Change Events

vad, _ := audio.NewSimpleVAD(audio.DefaultVADParams())
stateChanges := vad.OnStateChange()

go func() {
    for event := range stateChanges {
        fmt.Printf("State: %s -> %s (confidence: %.2f)\n",
            event.PrevState, event.State, event.Confidence)
    }
}()

Integration with SDK

VAD is typically used with audio sessions:

conv, _ := sdk.Open("./pack.json", "assistant")

// Create audio session with VAD
session, _ := conv.OpenAudioSession(ctx,
    sdk.WithSessionVAD(audio.NewSimpleVAD(audio.DefaultVADParams())),
)

// VAD automatically processes audio chunks
session.SendChunk(ctx, audioChunk)

Notes