Skip to content

Voice Interview System

A comprehensive voice-enabled interview demonstration showcasing PromptKit’s stage-based pipeline architecture with support for streaming, VAD, TTS, and ASM-based models.

  • Dual Audio Modes

    • ASM Mode: Native bidirectional audio streaming with Gemini 2.0 (continuous, real-time)
    • VAD Mode: Voice Activity Detection with turn-based processing and TTS output
  • Stage-Based Pipeline Architecture

    • Demonstrates the full power of PromptKit’s streaming pipeline
    • Real-time audio processing through pipeline stages
    • Seamless integration with multiple provider modes
  • Multiple Interview Topics

    • Classic Rock Music
    • Space Exploration
    • Programming & Computer Science
    • World History
    • Movies & Cinema
  • Optional Webcam Integration

    • Send periodic webcam frames for multimodal context
    • Visual engagement analysis
  • Rich Terminal UI

    • Real-time audio level visualization
    • Progress tracking with score display
    • Live transcript display
    • Beautiful, interactive interface using Bubbletea
Terminal window
# macOS
brew install portaudio ffmpeg
# Ubuntu/Debian
sudo apt-get install portaudio19-dev ffmpeg
# Windows
# Download PortAudio from http://www.portaudio.com/
# Download ffmpeg from https://ffmpeg.org/download.html
Terminal window
export GEMINI_API_KEY=your_api_key_here
Terminal window
# Navigate to the example directory
cd sdk/examples/voice-interview
# Run with default settings (ASM mode, Classic Rock topic)
go run .
# Run with a specific topic
go run . --topic programming
# Run in VAD mode (turn-based with TTS)
go run . --mode vad --topic space
# Enable webcam for visual context
go run . --webcam --topic movies
# List all available topics
go run . --list-topics
FlagDefaultDescription
--modeasmAudio mode: asm (native audio) or vad (turn-based with TTS)
--topicclassic-rockInterview topic (see --list-topics)
--webcamfalseEnable webcam for visual context
--pack./interview.pack.jsonPath to PromptPack file
--no-uifalseDisable rich terminal UI
--verbosefalseEnable verbose logging
--list-topics-List available interview topics and exit

This example demonstrates the stage-based pipeline architecture:

┌─────────────────────────────────────────────────────────────────┐
│ Voice Interview Pipeline │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Audio │ │ VAD │ │ Provider │ │
│ │ Capture │───▶│ Stage │───▶│ Stage │ │
│ │ Stage │ │ (if VAD mode)│ │ (ASM/Text) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │
│ │ ▼ │
│ │ ┌──────────────┐ ┌──────────────┐ │
│ │ │ TTS │◀───│ Response │ │
│ │ │ Stage │ │ Processing │ │
│ │ │ (if VAD mode)│ │ │ │
│ │ └──────────────┘ └──────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Audio Playback │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘

In ASM mode, the pipeline uses native bidirectional audio streaming:

  • Continuous streaming: Audio flows continuously in both directions
  • No turn detection needed: The model handles conversation flow
  • Lower latency: Real-time response without waiting for turn boundaries
  • Requires: Gemini 2.0 Flash Exp or similar ASM-capable models

In VAD mode, the pipeline uses turn-based processing:

  • Turn detection: VAD stage detects speech/silence boundaries
  • Accumulation: Speech is accumulated until silence detected
  • TTS output: Text responses are converted to speech
  • Works with: Any text-based LLM + TTS service
voice-interview/
├── main.go # Entry point with mode selection
├── interview.pack.json # PromptPack configuration
├── README.md # This file
├── audio/
│ └── portaudio.go # Audio capture/playback
├── video/
│ └── webcam.go # Webcam capture (optional)
├── interview/
│ ├── controller.go # Interview orchestration
│ ├── state.go # State management
│ └── questions.go # Question banks
└── ui/
└── app.go # Bubbletea terminal UI

Edit interview/questions.go to add new question banks:

func myCustomQuestions() *QuestionBank {
return &QuestionBank{
Topic: "My Custom Topic",
Description: "Description of the topic",
Questions: []Question{
{
ID: "custom-1",
Text: "Your question here?",
Answer: "Expected answer",
Hint: "Optional hint",
Category: "category",
},
// Add more questions...
},
}
}

Then register it in GetQuestionBank().

The interview behavior is defined in interview.pack.json. Modify the system template to change:

  • Interviewer personality
  • Scoring guidelines
  • Feedback style
  • Response format

Adjust audio settings in audio/portaudio.go:

const (
InputSampleRate = 16000 // Microphone sample rate
OutputSampleRate = 24000 // Speaker sample rate
Channels = 1 // Mono audio
InputFramesPerBuffer = 1600 // 100ms chunks
EnergyThreshold = 500 // VAD sensitivity
)
  1. Check microphone permissions in system settings
  2. Verify PortAudio installation: brew info portaudio
  3. List audio devices: The app will show available devices on startup
  1. Ensure ffmpeg is installed: ffmpeg -version
  2. Check camera permissions
  3. Try a different device index: The app uses device 0 by default
  1. Verify GEMINI_API_KEY is set correctly
  2. Check API quota and rate limits
  3. Ensure you have access to the required models:
    • ASM mode: gemini-2.0-flash-exp
    • VAD mode: gemini-2.5-flash

Run with --no-ui flag for simple terminal output if the rich UI doesn’t render correctly.

╔══════════════════════════════════════════════════════════════╗
║ 🎤 Voice Interview System - PromptKit Demo ║
╠══════════════════════════════════════════════════════════════╣
║ Topic: Classic Rock Music ║
║ Mode: ASM (Native Audio) ║
║ Questions: 5 ║
╠══════════════════════════════════════════════════════════════╣
║ Controls: ║
║ • Speak naturally into your microphone ║
║ • Press Ctrl+C to end the interview ║
╚══════════════════════════════════════════════════════════════╝
🎤 [████████████████░░░░░░░░░░░░░░] 53%
Question 1 of 5
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Q1: Which band released the album 'Dark Side of the Moon' in 1973?
🤖 That's correct! Pink Floyd released this iconic album...
👤 Pink Floyd
Score: 10/50 │ Progress: 20%

This example is part of PromptKit and is available under the same license.