Tutorial 4: Testing MCP Tools
Learn how to test LLMs that use Model Context Protocol (MCP) tools and function calling.
What You’ll Learn
Section titled “What You’ll Learn”- Configure MCP tool servers
- Test tool/function calling
- Validate tool arguments
- Mock tool responses for testing
- Debug tool integration issues
Prerequisites
Section titled “Prerequisites”- Completed Tutorial 1-3
- Understanding of function calling in LLMs
- Node.js installed (for MCP servers)
What are MCP Tools?
Section titled “What are MCP Tools?”Model Context Protocol (MCP) enables LLMs to interact with external systems:
- Database queries: Read/write data
- API calls: External service integration
- File operations: Read/write files
- System commands: Execute scripts
MCP standardizes how LLMs call tools across providers.
Step 1: Install MCP Server
Section titled “Step 1: Install MCP Server”# Install the MCP filesystem server (example)npm install -g @modelcontextprotocol/server-filesystem
# Or use PromptKit's built-in MCP memory servercd $GOPATH/src/github.com/altairalabs/promptkitgo install ./runtime/mcp/servers/memoryStep 2: Configure MCP Server
Section titled “Step 2: Configure MCP Server”MCP servers are configured directly in your Arena configuration. The tools they provide are auto-discovered.
Step 3: Configure Tools in Arena
Section titled “Step 3: Configure Tools in Arena”Edit arena.yaml:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Arenametadata: name: mcp-tools-test
spec: prompt_configs: - id: assistant file: prompts/assistant-with-tools.yaml
providers: - file: providers/openai.yaml
scenarios: - file: scenarios/tool-calling-test.yaml
# Add MCP server configuration mcp_servers: memory: command: mcp-memory-server args: [] env: LOG_LEVEL: infoStep 4: Create Tool-Enabled Prompt
Section titled “Step 4: Create Tool-Enabled Prompt”Create prompts/assistant-with-tools.yaml:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: PromptConfigmetadata: name: assistant-with-tools
spec: task_type: assistant
system_template: | You are a helpful assistant with access to memory storage tools.
When users ask you to remember information, use the store_memory tool. When users ask you to recall information, use the recall_memory tool.
Always confirm when you've stored or retrieved information.Step 5: Create Tool-Calling Test
Section titled “Step 5: Create Tool-Calling Test”Create scenarios/tool-calling-test.yaml:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: basic-tool-calling labels: category: tools protocol: mcp
spec: task_type: assistant
turns: # Turn 1: Request to store information - role: user content: "Remember that my favorite color is blue" assertions: - type: tools_called params: tools: ["store_memory"] message: "Should call store_memory tool"
- type: content_includes params: patterns: ["remember"] message: "Should confirm storage"
# Turn 2: Request to recall information - role: user content: "What's my favorite color?" assertions: - type: tools_called params: tools: ["recall_memory"] message: "Should call recall_memory tool"
- type: content_includes params: patterns: ["blue"] message: "Should include recalled information"Step 6: Run Tool Tests
Section titled “Step 6: Run Tool Tests”# Run with tools enabledpromptarena run --scenario tool-calling-test
# View detailed tool executionpromptarena run --verbose --scenario tool-calling-testStep 7: Mock Tool Responses
Section titled “Step 7: Mock Tool Responses”For testing without real tool execution, create mock tool definitions:
Create tools/store-memory-mock.yaml:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Toolmetadata: name: store-memory-mock
spec: name: store_memory description: "Store information in memory"
input_schema: type: object properties: key: type: string description: "Memory key" value: type: string description: "Value to store" required: [key, value]
mode: mock mock_result: success: true message: "Stored successfully"Create tools/recall-memory-mock.yaml:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Toolmetadata: name: recall-memory-mock
spec: name: recall_memory description: "Recall stored information"
input_schema: type: object properties: key: type: string description: "Memory key to recall" required: [key]
mode: mock mock_template: | { "success": true, "value": "blue" }Update arena.yaml:
spec: # Use mock tools instead of MCP servers for testing tools: - file: tools/store-memory-mock.yaml - file: tools/recall-memory-mock.yamlStep 8: Complex Tool Scenarios
Section titled “Step 8: Complex Tool Scenarios”Sequential Tool Calls
Section titled “Sequential Tool Calls”apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: multiple-tool-operations labels: category: tools complexity: complex
spec: task_type: assistant
turns: - role: user content: "Remember: my name is Alice, email is alice@example.com, and I'm a developer" assertions: - type: tools_called params: tools: ["store_memory"] message: "Should call store_memory multiple times"Conditional Tool Use
Section titled “Conditional Tool Use”apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: conditional-tool-calling labels: category: conditional
spec: task_type: assistant
turns: # Scenario where no tool is needed - role: user content: "What's 2+2?" assertions: - type: content_includes params: patterns: ["4"] message: "Should answer directly"
# Scenario where tool is needed - role: user content: "Look up the weather in San Francisco" assertions: - type: tools_called params: tools: ["get_weather"] message: "Should call weather tool"Error Handling
Section titled “Error Handling”apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: tool-error-handling labels: category: error-handling
spec: task_type: assistant
turns: - role: user content: "Recall my favorite food" assertions: - type: tools_called params: tools: ["recall_memory"] message: "Should attempt to recall"
- type: content_includes params: patterns: ["don't have"] message: "Should handle gracefully when not found"Step 9: Testing Different Tool Types
Section titled “Step 9: Testing Different Tool Types”Database Tools
Section titled “Database Tools”apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: database-query labels: category: database
spec: task_type: assistant
turns: - role: user content: "Find all users with role 'admin'" assertions: - type: tools_called params: tools: ["query_database"] message: "Should query database"
- type: content_includes params: patterns: ["admin"] message: "Should mention admin users"API Integration
Section titled “API Integration”apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: external-api-call labels: category: api
spec: task_type: assistant
turns: - role: user content: "Get the current Bitcoin price" assertions: - type: tools_called params: tools: ["fetch_crypto_price"] message: "Should call crypto API"
- type: content_includes params: patterns: ["Bitcoin"] message: "Should mention Bitcoin"File Operations
Section titled “File Operations”apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: file-read-operation labels: category: filesystem
spec: task_type: assistant
turns: - role: user content: "Read the contents of data.json" assertions: - type: tools_called params: tools: ["read_file"] message: "Should call read_file"Step 10: Advanced Tool Testing
Section titled “Step 10: Advanced Tool Testing”Tool Call Chains
Section titled “Tool Call Chains”Test when one tool call leads to another:
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: tool-call-chain labels: category: chain
spec: task_type: assistant
turns: - role: user content: "Find Alice's email and send her a welcome message" assertions: - type: tools_called params: tools: ["lookup_user", "send_email"] message: "Should call both tools in sequence"Parallel Tool Calls
Section titled “Parallel Tool Calls”apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: parallel-tool-execution labels: category: parallel
spec: task_type: assistant
turns: - role: user content: "Check the weather in New York, London, and Tokyo" assertions: - type: tools_called params: tools: ["get_weather"] message: "Should call weather tool for multiple locations"Debugging Tool Issues
Section titled “Debugging Tool Issues”Check Tool Configuration
Section titled “Check Tool Configuration”# Inspect tool configurationpromptarena config-inspect --verbose
# Should show loaded toolsVerbose Tool Execution
Section titled “Verbose Tool Execution”# See detailed tool calls and responsespromptarena run --verbose --scenario tool-calling-test
# Output shows:# [TOOL CALL] store_memory({"key": "favorite_color", "value": "blue"})# [TOOL RESPONSE] {"success": true, "message": "Stored successfully"}Debug MCP Server
Section titled “Debug MCP Server”# Test MCP server directlyecho '{"method": "tools/list"}' | mcp-memory-server
# Check server logsexport LOG_LEVEL=debugpromptarena run --scenario tool-testTool Testing Best Practices
Section titled “Tool Testing Best Practices”1. Test Tool Selection
Section titled “1. Test Tool Selection”# Verify correct tool is chosenassertions: - type: tools_called params: tools: ["correct_tool_name"] message: "Should call the right tool"2. Validate Tool Calls
Section titled “2. Validate Tool Calls”# Check that tools are called appropriatelyassertions: - type: tools_called params: tools: ["expected_tool"] message: "Should use the expected tool"3. Mock External Dependencies
Section titled “3. Mock External Dependencies”# Use mock tools for external servicesapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Toolmetadata: name: mock-external-api
spec: name: external_api description: "Mock external API" mode: mock mock_result: status: "success"4. Test Error Scenarios
Section titled “4. Test Error Scenarios”apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Scenariometadata: name: tool-failure-handling
spec: task_type: assistant
turns: - role: user content: "Do something that requires a tool" assertions: - type: content_includes params: patterns: ["error"] message: "Should handle tool errors gracefully"Common Issues
Section titled “Common Issues”Tool Not Called
Section titled “Tool Not Called”# Check if tools are enabled in promptcat prompts/assistant-with-tools.yaml | grep tools_enabled
# Should be: tools_enabled: trueWrong Tool Arguments
Section titled “Wrong Tool Arguments”# View actual tool callscat out/results.json | jq '.results[] | select(.tool_calls != null) | { tool: .tool_calls[].name, args: .tool_calls[].arguments}'MCP Server Connection Failed
Section titled “MCP Server Connection Failed”# Verify MCP server is runningps aux | grep mcp
# Test MCP server directlymcp-memory-server --helpNext Steps
Section titled “Next Steps”You now know how to test LLMs with tool calling!
Continue learning:
- Tutorial 5: CI Integration - Automate tool testing in CI/CD
- How-To: MCP Tools - Advanced tool configuration
- Runtime: Tools & MCP - Complete tool reference
Try this:
- Create custom MCP tools
- Test tool calling across multiple providers
- Build a tool call chain test
- Mock complex external APIs
What’s Next?
Section titled “What’s Next?”In Tutorial 5, you’ll learn how to integrate all these tests into your CI/CD pipeline for automated quality gates.