Voice Interview System
A comprehensive voice-enabled interview demonstration showcasing PromptKit’s stage-based pipeline architecture with support for streaming, VAD, TTS, and ASM-based models.
Features
Section titled “Features”-
Dual Audio Modes
- ASM Mode: Native bidirectional audio streaming with Gemini 2.0 (continuous, real-time)
- VAD Mode: Voice Activity Detection with turn-based processing and TTS output
-
Stage-Based Pipeline Architecture
- Demonstrates the full power of PromptKit’s streaming pipeline
- Real-time audio processing through pipeline stages
- Seamless integration with multiple provider modes
-
Multiple Interview Topics
- Classic Rock Music
- Space Exploration
- Programming & Computer Science
- World History
- Movies & Cinema
-
Optional Webcam Integration
- Send periodic webcam frames for multimodal context
- Visual engagement analysis
-
Rich Terminal UI
- Real-time audio level visualization
- Progress tracking with score display
- Live transcript display
- Beautiful, interactive interface using Bubbletea
Requirements
Section titled “Requirements”System Dependencies
Section titled “System Dependencies”# macOSbrew install portaudio ffmpeg
# Ubuntu/Debiansudo apt-get install portaudio19-dev ffmpeg
# Windows# Download PortAudio from http://www.portaudio.com/# Download ffmpeg from https://ffmpeg.org/download.htmlEnvironment
Section titled “Environment”export GEMINI_API_KEY=your_api_key_hereQuick Start
Section titled “Quick Start”# Navigate to the example directorycd sdk/examples/voice-interview
# Run with default settings (ASM mode, Classic Rock topic)go run .
# Run with a specific topicgo run . --topic programming
# Run in VAD mode (turn-based with TTS)go run . --mode vad --topic space
# Enable webcam for visual contextgo run . --webcam --topic movies
# List all available topicsgo run . --list-topicsCommand-Line Options
Section titled “Command-Line Options”| Flag | Default | Description |
|---|---|---|
--mode | asm | Audio mode: asm (native audio) or vad (turn-based with TTS) |
--topic | classic-rock | Interview topic (see --list-topics) |
--webcam | false | Enable webcam for visual context |
--pack | ./interview.pack.json | Path to PromptPack file |
--no-ui | false | Disable rich terminal UI |
--verbose | false | Enable verbose logging |
--list-topics | - | List available interview topics and exit |
Architecture
Section titled “Architecture”Pipeline Stages
Section titled “Pipeline Stages”This example demonstrates the stage-based pipeline architecture:
┌─────────────────────────────────────────────────────────────────┐│ Voice Interview Pipeline │├─────────────────────────────────────────────────────────────────┤│ ││ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ││ │ Audio │ │ VAD │ │ Provider │ ││ │ Capture │───▶│ Stage │───▶│ Stage │ ││ │ Stage │ │ (if VAD mode)│ │ (ASM/Text) │ ││ └──────────────┘ └──────────────┘ └──────────────┘ ││ │ │ ││ │ ▼ ││ │ ┌──────────────┐ ┌──────────────┐ ││ │ │ TTS │◀───│ Response │ ││ │ │ Stage │ │ Processing │ ││ │ │ (if VAD mode)│ │ │ ││ │ └──────────────┘ └──────────────┘ ││ │ │ ││ ▼ ▼ ││ ┌──────────────────────────────────────────────────────┐ ││ │ Audio Playback │ ││ └──────────────────────────────────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────┘ASM Mode (Audio Streaming Model)
Section titled “ASM Mode (Audio Streaming Model)”In ASM mode, the pipeline uses native bidirectional audio streaming:
- Continuous streaming: Audio flows continuously in both directions
- No turn detection needed: The model handles conversation flow
- Lower latency: Real-time response without waiting for turn boundaries
- Requires: Gemini 2.0 Flash Exp or similar ASM-capable models
VAD Mode (Voice Activity Detection)
Section titled “VAD Mode (Voice Activity Detection)”In VAD mode, the pipeline uses turn-based processing:
- Turn detection: VAD stage detects speech/silence boundaries
- Accumulation: Speech is accumulated until silence detected
- TTS output: Text responses are converted to speech
- Works with: Any text-based LLM + TTS service
Project Structure
Section titled “Project Structure”voice-interview/├── main.go # Entry point with mode selection├── interview.pack.json # PromptPack configuration├── README.md # This file├── audio/│ └── portaudio.go # Audio capture/playback├── video/│ └── webcam.go # Webcam capture (optional)├── interview/│ ├── controller.go # Interview orchestration│ ├── state.go # State management│ └── questions.go # Question banks└── ui/ └── app.go # Bubbletea terminal UICustomization
Section titled “Customization”Adding New Topics
Section titled “Adding New Topics”Edit interview/questions.go to add new question banks:
func myCustomQuestions() *QuestionBank { return &QuestionBank{ Topic: "My Custom Topic", Description: "Description of the topic", Questions: []Question{ { ID: "custom-1", Text: "Your question here?", Answer: "Expected answer", Hint: "Optional hint", Category: "category", }, // Add more questions... }, }}Then register it in GetQuestionBank().
Modifying the Interview Flow
Section titled “Modifying the Interview Flow”The interview behavior is defined in interview.pack.json. Modify the system template to change:
- Interviewer personality
- Scoring guidelines
- Feedback style
- Response format
Custom Audio Configuration
Section titled “Custom Audio Configuration”Adjust audio settings in audio/portaudio.go:
const ( InputSampleRate = 16000 // Microphone sample rate OutputSampleRate = 24000 // Speaker sample rate Channels = 1 // Mono audio InputFramesPerBuffer = 1600 // 100ms chunks EnergyThreshold = 500 // VAD sensitivity)Troubleshooting
Section titled “Troubleshooting”No Audio Input
Section titled “No Audio Input”- Check microphone permissions in system settings
- Verify PortAudio installation:
brew info portaudio - List audio devices: The app will show available devices on startup
Webcam Not Working
Section titled “Webcam Not Working”- Ensure ffmpeg is installed:
ffmpeg -version - Check camera permissions
- Try a different device index: The app uses device 0 by default
API Errors
Section titled “API Errors”- Verify
GEMINI_API_KEYis set correctly - Check API quota and rate limits
- Ensure you have access to the required models:
- ASM mode:
gemini-2.0-flash-exp - VAD mode:
gemini-2.5-flash
- ASM mode:
UI Display Issues
Section titled “UI Display Issues”Run with --no-ui flag for simple terminal output if the rich UI doesn’t render correctly.
Example Session
Section titled “Example Session”╔══════════════════════════════════════════════════════════════╗║ 🎤 Voice Interview System - PromptKit Demo ║╠══════════════════════════════════════════════════════════════╣║ Topic: Classic Rock Music ║║ Mode: ASM (Native Audio) ║║ Questions: 5 ║╠══════════════════════════════════════════════════════════════╣║ Controls: ║║ • Speak naturally into your microphone ║║ • Press Ctrl+C to end the interview ║╚══════════════════════════════════════════════════════════════╝
🎤 [████████████████░░░░░░░░░░░░░░] 53%
Question 1 of 5━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━Q1: Which band released the album 'Dark Side of the Moon' in 1973?
🤖 That's correct! Pink Floyd released this iconic album...👤 Pink Floyd
Score: 10/50 │ Progress: 20%Related Examples
Section titled “Related Examples”duplex-streaming- Basic duplex streaming examplestreaming- Text streaming examplemultimodal- Image/audio input example
License
Section titled “License”This example is part of PromptKit and is available under the same license.