A comprehensive voice-enabled interview demonstration showcasing PromptKit’s stage-based pipeline architecture with support for streaming, VAD, TTS, and ASM-based models.
Features
-
Dual Audio Modes
- ASM Mode: Native bidirectional audio streaming with Gemini 2.0 (continuous, real-time)
- VAD Mode: Voice Activity Detection with turn-based processing and TTS output
-
Stage-Based Pipeline Architecture
- Demonstrates the full power of PromptKit’s streaming pipeline
- Real-time audio processing through pipeline stages
- Seamless integration with multiple provider modes
-
Multiple Interview Topics
- Classic Rock Music
- Space Exploration
- Programming & Computer Science
- World History
- Movies & Cinema
-
Optional Webcam Integration
- Send periodic webcam frames for multimodal context
- Visual engagement analysis
-
Rich Terminal UI
- Real-time audio level visualization
- Progress tracking with score display
- Live transcript display
- Beautiful, interactive interface using Bubbletea
Requirements
System Dependencies
# macOS
brew install portaudio ffmpeg
# Ubuntu/Debian
sudo apt-get install portaudio19-dev ffmpeg
# Windows
# Download PortAudio from http://www.portaudio.com/
# Download ffmpeg from https://ffmpeg.org/download.html
Environment
export GEMINI_API_KEY=your_api_key_here
Quick Start
# Navigate to the example directory
cd sdk/examples/voice-interview
# Run with default settings (ASM mode, Classic Rock topic)
go run .
# Run with a specific topic
go run . --topic programming
# Run in VAD mode (turn-based with TTS)
go run . --mode vad --topic space
# Enable webcam for visual context
go run . --webcam --topic movies
# List all available topics
go run . --list-topics
Command-Line Options
| Flag | Default | Description |
|---|---|---|
--mode | asm | Audio mode: asm (native audio) or vad (turn-based with TTS) |
--topic | classic-rock | Interview topic (see --list-topics) |
--webcam | false | Enable webcam for visual context |
--pack | ./interview.pack.json | Path to PromptPack file |
--no-ui | false | Disable rich terminal UI |
--verbose | false | Enable verbose logging |
--list-topics | - | List available interview topics and exit |
Architecture
Pipeline Stages
This example demonstrates the stage-based pipeline architecture:
┌─────────────────────────────────────────────────────────────────┐
│ Voice Interview Pipeline │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Audio │ │ VAD │ │ Provider │ │
│ │ Capture │───▶│ Stage │───▶│ Stage │ │
│ │ Stage │ │ (if VAD mode)│ │ (ASM/Text) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │
│ │ ▼ │
│ │ ┌──────────────┐ ┌──────────────┐ │
│ │ │ TTS │◀───│ Response │ │
│ │ │ Stage │ │ Processing │ │
│ │ │ (if VAD mode)│ │ │ │
│ │ └──────────────┘ └──────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Audio Playback │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
ASM Mode (Audio Streaming Model)
In ASM mode, the pipeline uses native bidirectional audio streaming:
- Continuous streaming: Audio flows continuously in both directions
- No turn detection needed: The model handles conversation flow
- Lower latency: Real-time response without waiting for turn boundaries
- Requires: Gemini 2.0 Flash Exp or similar ASM-capable models
VAD Mode (Voice Activity Detection)
In VAD mode, the pipeline uses turn-based processing:
- Turn detection: VAD stage detects speech/silence boundaries
- Accumulation: Speech is accumulated until silence detected
- TTS output: Text responses are converted to speech
- Works with: Any text-based LLM + TTS service
Project Structure
voice-interview/
├── main.go # Entry point with mode selection
├── interview.pack.json # PromptPack configuration
├── README.md # This file
├── audio/
│ └── portaudio.go # Audio capture/playback
├── video/
│ └── webcam.go # Webcam capture (optional)
├── interview/
│ ├── controller.go # Interview orchestration
│ ├── state.go # State management
│ └── questions.go # Question banks
└── ui/
└── app.go # Bubbletea terminal UI
Customization
Adding New Topics
Edit interview/questions.go to add new question banks:
func myCustomQuestions() *QuestionBank {
return &QuestionBank{
Topic: "My Custom Topic",
Description: "Description of the topic",
Questions: []Question{
{
ID: "custom-1",
Text: "Your question here?",
Answer: "Expected answer",
Hint: "Optional hint",
Category: "category",
},
// Add more questions...
},
}
}
Then register it in GetQuestionBank().
Modifying the Interview Flow
The interview behavior is defined in interview.pack.json. Modify the system template to change:
- Interviewer personality
- Scoring guidelines
- Feedback style
- Response format
Custom Audio Configuration
Adjust audio settings in audio/portaudio.go:
const (
InputSampleRate = 16000 // Microphone sample rate
OutputSampleRate = 24000 // Speaker sample rate
Channels = 1 // Mono audio
InputFramesPerBuffer = 1600 // 100ms chunks
EnergyThreshold = 500 // VAD sensitivity
)
Troubleshooting
No Audio Input
- Check microphone permissions in system settings
- Verify PortAudio installation:
brew info portaudio - List audio devices: The app will show available devices on startup
Webcam Not Working
- Ensure ffmpeg is installed:
ffmpeg -version - Check camera permissions
- Try a different device index: The app uses device 0 by default
API Errors
- Verify
GEMINI_API_KEYis set correctly - Check API quota and rate limits
- Ensure you have access to the required models:
- ASM mode:
gemini-2.0-flash-exp - VAD mode:
gemini-2.5-flash
- ASM mode:
UI Display Issues
Run with --no-ui flag for simple terminal output if the rich UI doesn’t render correctly.
Example Session
╔══════════════════════════════════════════════════════════════╗
║ 🎤 Voice Interview System - PromptKit Demo ║
╠══════════════════════════════════════════════════════════════╣
║ Topic: Classic Rock Music ║
║ Mode: ASM (Native Audio) ║
║ Questions: 5 ║
╠══════════════════════════════════════════════════════════════╣
║ Controls: ║
║ • Speak naturally into your microphone ║
║ • Press Ctrl+C to end the interview ║
╚══════════════════════════════════════════════════════════════╝
🎤 [████████████████░░░░░░░░░░░░░░] 53%
Question 1 of 5
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Q1: Which band released the album 'Dark Side of the Moon' in 1973?
🤖 That's correct! Pink Floyd released this iconic album...
👤 Pink Floyd
Score: 10/50 │ Progress: 20%
Related Examples
duplex-streaming- Basic duplex streaming examplestreaming- Text streaming examplemultimodal- Image/audio input example
License
This example is part of PromptKit and is available under the same license.