Tutorial 3: Multi-Turn Conversations
Learn how to test complex multi-turn conversations that maintain context across exchanges.
What You’ll Learn
Section titled “What You’ll Learn”- Create multi-turn conversation flows
- Test context retention across turns
- Handle conversation state
- Validate conversation coherence
- Test conversation branching
Prerequisites
Section titled “Prerequisites”- Completed Tutorial 1 and Tutorial 2
- Basic understanding of conversation design
Why Multi-Turn Testing?
Section titled “Why Multi-Turn Testing?”Real LLM applications involve conversations, not just single Q&A:
- Customer support: Back-and-forth troubleshooting
- Chatbots: Building rapport over multiple exchanges
- Assistants: Following complex instructions step-by-step
- Agents: Maintaining task state across turns
Multi-turn testing ensures:
- Context is retained between messages
- Responses reference previous exchanges
- Conversation flow feels natural
- State management works correctly
Step 1: Basic Multi-Turn Scenario
Section titled “Step 1: Basic Multi-Turn Scenario”Create scenarios/support-conversation.yaml:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: account-issue-resolution labels: category: multi-turn type: customer-service
spec: task_type: support
turns: # Turn 1: Initial problem statement - role: user content: "I can't access my account" assertions: - type: content_includes params: patterns: ["help"] message: "Should offer help"
# Turn 2: Providing details - role: user content: "I get an error message saying 'Invalid credentials'" assertions: - type: content_matches params: pattern: "(?i)(password|reset|credentials)" message: "Should reference password reset"
# Turn 3: Follow-up question - role: user content: "How long will it take?" assertions: - type: content_includes params: patterns: ["time"] message: "Should provide timeframe"
# Turn 4: Additional inquiry - role: user content: "Will I lose my saved preferences?" assertions: - type: content_includes params: patterns: ["preferences"] message: "Should address preferences concern"Step 2: Test Context Retention
Section titled “Step 2: Test Context Retention”Run the test:
promptarena run --scenario support-conversationThe references_previous assertion checks if the response demonstrates awareness of earlier turns.
Step 3: Information Gathering Flow
Section titled “Step 3: Information Gathering Flow”Create scenarios/progressive-disclosure.yaml:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: flight-booking labels: category: progressive type: multi-turn
spec: task_type: support description: "Step-by-step information collection"
context_metadata: session_goal: "Book a flight"
turns: # Turn 1: Initial inquiry - role: user content: "I need to book a flight" assertions: - type: content_includes params: patterns: ["destination"] message: "Should ask for destination"
# Turn 2: Provide destination - role: user content: "To New York" assertions: - type: content_includes params: patterns: ["date"] message: "Should ask for date"
# Turn 3: Provide date - role: user content: "Next Friday" assertions: - type: content_includes params: patterns: ["class"] message: "Should ask for class preferences"
# Turn 4: Complete booking - role: user content: "Economy class, window seat" assertions: - type: content_includes params: patterns: ["confirm"] message: "Should confirm booking details"Step 4: Conversation Branching
Section titled “Step 4: Conversation Branching”Test different conversation paths:
# Path A: Successful resolutionapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: happy-path-conversation labels: path: happy
spec: task_type: support
turns: - role: user content: "My order hasn't arrived" - role: user content: "Order number is #12345" - role: user content: "Yes, the address is correct" - role: user content: "Great, thank you!" assertions: - type: content_includes params: patterns: ["welcome"] message: "Should acknowledge thanks positively"
---# Path B: Escalation neededapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: escalation-path labels: path: escalation
spec: task_type: support
turns: - role: user content: "My order hasn't arrived" - role: user content: "Order number is #12345" - role: user content: "No, I need it urgently" - role: user content: "This is unacceptable" assertions: - type: content_includes params: patterns: ["supervisor"] message: "Should offer escalation"Step 5: Testing Conversation Memory
Section titled “Step 5: Testing Conversation Memory”Create scenarios/memory-test.yaml:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: long-term-memory-test labels: category: memory type: context-retention
spec: task_type: support
turns: # Turn 1: Introduction - role: user content: "Hi, my name is Alice and I'm calling about my account" assertions: - type: content_includes params: patterns: ["Alice"] message: "Should acknowledge name"
# Turn 2-5: Other topics - role: user content: "What are your business hours?" - role: user content: "Do you offer international shipping?" - role: user content: "What's your return policy?"
# Turn 6: Reference earlier context - role: user content: "What was my name again?" assertions: - type: content_includes params: patterns: ["Alice"] message: "Should remember name from turn 1"Step 6: Conditional Responses
Section titled “Step 6: Conditional Responses”Test context-dependent responses:
# Premium user scenarioapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: premium-user-support
spec: task_type: support
context_metadata: user_tier: premium account_id: "P-12345"
turns: - role: user content: "I need help with my account" assertions: - type: content_includes params: patterns: ["premium"] message: "Should recognize premium tier"
---# Basic user scenarioapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: basic-user-support
spec: task_type: support
context_metadata: user_tier: basic account_id: "B-67890"
turns: - role: user content: "I need help with my account" assertions: - type: content_includes params: patterns: ["help"] message: "Should offer helpful support"Step 7: Error Recovery
Section titled “Step 7: Error Recovery”Test how the system handles conversation errors:
# Clarification scenarioapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: clarification-request labels: category: error-recovery
spec: task_type: support
turns: - role: user content: "I need that thing" assertions: - type: content_includes params: patterns: ["clarify"] message: "Should ask for clarification"
- role: user content: "Sorry, I meant the refund policy" assertions: - type: content_includes params: patterns: ["refund"] message: "Should proceed with clarified topic"
---# Misunderstanding correctionapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: misunderstanding-correction labels: category: correction
spec: task_type: support
turns: - role: user content: "When can I get my order?"
- role: user content: "Actually, I meant to ask about returns, not delivery" assertions: - type: content_includes params: patterns: ["return"] message: "Should pivot to the corrected topic"Step 8: Run Multi-Turn Tests
Section titled “Step 8: Run Multi-Turn Tests”# Run all multi-turn testspromptarena run --scenario support-conversation,progressive-disclosure,memory-test
# Generate detailed HTML reportpromptarena run --format html
# View conversation flowsopen out/report-*.htmlAnalyzing Multi-Turn Results
Section titled “Analyzing Multi-Turn Results”Review JSON Output
Section titled “Review JSON Output”cat out/results.json | jq '.results[] | select(.scenario == "Account Issue Resolution") | { turn: .turn, user_message: .user_message, response: .response, assertions_passed: .assertions_passed}'Check Context Retention
Section titled “Check Context Retention”# Find tests with context retention issuescat out/results.json | jq '.results[] | select(.assertions[] | select(.type == "references_previous" and .passed == false))'Advanced Patterns
Section titled “Advanced Patterns”Self-Play Testing
Section titled “Self-Play Testing”Test both sides of a conversation:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: self-play-customer-interaction labels: category: self-play
spec: task_type: support
self_play: enabled: true persona: frustrated-customer max_turns: 10 exit_conditions: - satisfaction_expressed - escalation_requestedRun self-play mode:
promptarena run --selfplay --scenario self-play-customerConversation Patterns
Section titled “Conversation Patterns”Information Extraction
Section titled “Information Extraction”spec: turns: - role: user content: "Book a table for 4 people tomorrow at 7pm" assertions: - type: content_includes params: patterns: ["4"] message: "Should capture party size"Confirmation Loop
Section titled “Confirmation Loop”spec: turns: - role: user content: "Cancel my subscription"
- role: user content: "Yes, I'm sure" assertions: - type: content_includes params: patterns: ["confirm"] message: "Should confirm cancellation"
- role: user content: "Can you tell me what I'll lose?" assertions: - type: content_includes params: patterns: ["lose"] message: "Should explain consequences"Best Practices
Section titled “Best Practices”1. Test Realistic Conversation Flows
Section titled “1. Test Realistic Conversation Flows”Model actual user interactions:
# ✅ Good - natural conversationspec: turns: - role: user content: "Hi, I have a question" - role: user content: "About shipping times" - role: user content: "To California"
# ❌ Avoid - too structuredspec: turns: - role: user content: "Question: What are shipping times to California?"2. Validate Context at Each Turn
Section titled “2. Validate Context at Each Turn”spec: turns: - role: user content: "I'm having an issue"
- role: user content: "With my recent order" assertions: - type: content_includes params: patterns: ["order"] message: "Should reference order context"3. Test Edge Cases
Section titled “3. Test Edge Cases”# Very long conversationapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: very-long-conversation
spec: task_type: support constraints: max_turns: 20 turns: # ... 20+ turns
---# Topic switchingapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: topic-switching
spec: task_type: support turns: - role: user content: "Question about billing" - role: user content: "Actually, never mind, tell me about features"
---# Ambiguous referencesapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: ambiguous-references
spec: task_type: support turns: - role: user content: "Tell me about plans" - role: user content: "What about that one?"4. Use Context Metadata for Complex State
Section titled “4. Use Context Metadata for Complex State”apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: resume-conversation
spec: task_type: support
context_metadata: previous_topic: "billing" unresolved_issues: ["payment failed"] user_mood: "frustrated"
turns: - role: user content: "Let's continue where we left off"Common Issues
Section titled “Common Issues”Context Not Maintained
Section titled “Context Not Maintained”# Test with verbose loggingpromptarena run --verbose --scenario memory-test
# Check if prompt includes conversation historyAssertions Too Strict
Section titled “Assertions Too Strict”# ❌ Too strictassertions: patterns: ["I understand you mentioned your order number earlier."]
# ✅ Betterassertions: - type: content_includes params: patterns: ["order number"] message: "Should reference order"Long Conversations Timeout
Section titled “Long Conversations Timeout”# Increase timeout for long conversationspromptarena run --timeout 300 # 5 minutesNext Steps
Section titled “Next Steps”You now know how to test complex multi-turn conversations!
Continue learning:
- Tutorial 4: MCP Tools - Test tool/function calling in conversations
- Tutorial 5: CI Integration - Automate conversation testing
- How-To: Write Scenarios - Advanced patterns
Try this:
- Create a 10+ turn conversation test
- Build a conversation decision tree
- Test conversation repair strategies
- Implement self-play testing
What’s Next?
Section titled “What’s Next?”In Tutorial 4, you’ll learn how to test LLMs that use tools and function calling within conversations.