Voice Activity Detection (VAD) Demo
This example demonstrates Voice Activity Detection using PromptKit’s audio package.
Features
Section titled “Features”- SimpleVAD: Basic voice activity detection using RMS energy analysis
- State Tracking: Monitor transitions between quiet/starting/speaking/stopping
- Configurable Parameters: Tune sensitivity for different environments
- Event Notifications: React to state changes in real-time
Running
Section titled “Running”cd sdk/examples/vad-demogo run .This example runs with simulated audio data - no microphone required.
VAD States
Section titled “VAD States”| State | Description |
|---|---|
quiet | No voice activity detected |
starting | Voice beginning (within start threshold) |
speaking | Active speech detected |
stopping | Voice ending (within stop threshold) |
Configuration
Section titled “Configuration”Default Parameters
Section titled “Default Parameters”params := audio.DefaultVADParams()// Confidence: 0.5// StartSecs: 0.2// StopSecs: 0.8// MinVolume: 0.01// SampleRate: 16000Strict VAD (noisy environments)
Section titled “Strict VAD (noisy environments)”params := audio.VADParams{ Confidence: 0.7, // Higher confidence required StartSecs: 0.3, // Longer speech to trigger StopSecs: 1.2, // Allow longer pauses MinVolume: 0.02, // Higher volume threshold SampleRate: 16000,}Sensitive VAD (quiet environments)
Section titled “Sensitive VAD (quiet environments)”params := audio.VADParams{ Confidence: 0.3, // More sensitive StartSecs: 0.1, // Quick start detection StopSecs: 0.5, // Quick end detection MinVolume: 0.005, // Detect quiet speech SampleRate: 16000,}State Change Events
Section titled “State Change Events”vad, _ := audio.NewSimpleVAD(audio.DefaultVADParams())stateChanges := vad.OnStateChange()
go func() { for event := range stateChanges { fmt.Printf("State: %s -> %s (confidence: %.2f)\n", event.PrevState, event.State, event.Confidence) }}()Integration with SDK
Section titled “Integration with SDK”VAD is typically used with audio sessions:
conv, _ := sdk.Open("./pack.json", "assistant")
// Create audio session with VADsession, _ := conv.OpenAudioSession(ctx, sdk.WithSessionVAD(audio.NewSimpleVAD(audio.DefaultVADParams())),)
// VAD automatically processes audio chunkssession.SendChunk(ctx, audioChunk)- VAD is energy-based (RMS volume analysis)
- Works with 16-bit PCM audio at configurable sample rates
- Default sample rate is 16kHz (common for speech recognition)
- Transition thresholds prevent false positives from brief sounds