Tutorial 4: Testing MCP Tools
Learn how to test LLMs that use Model Context Protocol (MCP) tools and function calling.
What You’ll Learn
- Configure MCP tool servers
- Test tool/function calling
- Validate tool arguments
- Mock tool responses for testing
- Debug tool integration issues
Prerequisites
- Completed Tutorial 1-3
- Understanding of function calling in LLMs
- Node.js installed (for MCP servers)
What are MCP Tools?
Model Context Protocol (MCP) enables LLMs to interact with external systems:
- Database queries: Read/write data
- API calls: External service integration
- File operations: Read/write files
- System commands: Execute scripts
MCP standardizes how LLMs call tools across providers.
Step 1: Install MCP Server
# Install the MCP filesystem server (example)
npm install -g @modelcontextprotocol/server-filesystem
# Or use PromptKit's built-in MCP memory server
cd $GOPATH/src/github.com/altairalabs/promptkit
go install ./runtime/mcp/servers/memory
Step 2: Configure MCP Server
MCP servers are configured directly in your Arena configuration. The tools they provide are auto-discovered.
Step 3: Configure Tools in Arena
Edit arena.yaml:
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Arena
metadata:
name: mcp-tools-test
spec:
prompt_configs:
- id: assistant
file: prompts/assistant-with-tools.yaml
providers:
- file: providers/openai.yaml
scenarios:
- file: scenarios/tool-calling-test.yaml
# Add MCP server configuration
mcp_servers:
memory:
command: mcp-memory-server
args: []
env:
LOG_LEVEL: info
Step 4: Create Tool-Enabled Prompt
Create prompts/assistant-with-tools.yaml:
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: PromptConfig
metadata:
name: assistant-with-tools
spec:
task_type: assistant
system_template: |
You are a helpful assistant with access to memory storage tools.
When users ask you to remember information, use the store_memory tool.
When users ask you to recall information, use the recall_memory tool.
Always confirm when you've stored or retrieved information.
Step 5: Create Tool-Calling Test
Create scenarios/tool-calling-test.yaml:
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: basic-tool-calling
labels:
category: tools
protocol: mcp
spec:
task_type: assistant
turns:
# Turn 1: Request to store information
- role: user
content: "Remember that my favorite color is blue"
assertions:
- type: tools_called
params:
tools: ["store_memory"]
message: "Should call store_memory tool"
- type: content_includes
params:
patterns: ["remember"]
message: "Should confirm storage"
# Turn 2: Request to recall information
- role: user
content: "What's my favorite color?"
assertions:
- type: tools_called
params:
tools: ["recall_memory"]
message: "Should call recall_memory tool"
- type: content_includes
params:
patterns: ["blue"]
message: "Should include recalled information"
Step 6: Run Tool Tests
# Run with tools enabled
promptarena run --scenario tool-calling-test
# View detailed tool execution
promptarena run --verbose --scenario tool-calling-test
Step 7: Mock Tool Responses
For testing without real tool execution, create mock tool definitions:
Create tools/store-memory-mock.yaml:
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Tool
metadata:
name: store-memory-mock
spec:
name: store_memory
description: "Store information in memory"
input_schema:
type: object
properties:
key:
type: string
description: "Memory key"
value:
type: string
description: "Value to store"
required: [key, value]
mode: mock
mock_result:
success: true
message: "Stored successfully"
Create tools/recall-memory-mock.yaml:
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Tool
metadata:
name: recall-memory-mock
spec:
name: recall_memory
description: "Recall stored information"
input_schema:
type: object
properties:
key:
type: string
description: "Memory key to recall"
required: [key]
mode: mock
mock_template: |
{
"success": true,
"value": "blue"
}
Update arena.yaml:
spec:
# Use mock tools instead of MCP servers for testing
tools:
- file: tools/store-memory-mock.yaml
- file: tools/recall-memory-mock.yaml
Step 8: Complex Tool Scenarios
Sequential Tool Calls
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: multiple-tool-operations
labels:
category: tools
complexity: complex
spec:
task_type: assistant
turns:
- role: user
content: "Remember: my name is Alice, email is alice@example.com, and I'm a developer"
assertions:
- type: tools_called
params:
tools: ["store_memory"]
message: "Should call store_memory multiple times"
Conditional Tool Use
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: conditional-tool-calling
labels:
category: conditional
spec:
task_type: assistant
turns:
# Scenario where no tool is needed
- role: user
content: "What's 2+2?"
assertions:
- type: content_includes
params:
patterns: ["4"]
message: "Should answer directly"
# Scenario where tool is needed
- role: user
content: "Look up the weather in San Francisco"
assertions:
- type: tools_called
params:
tools: ["get_weather"]
message: "Should call weather tool"
Error Handling
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: tool-error-handling
labels:
category: error-handling
spec:
task_type: assistant
turns:
- role: user
content: "Recall my favorite food"
assertions:
- type: tools_called
params:
tools: ["recall_memory"]
message: "Should attempt to recall"
- type: content_includes
params:
patterns: ["don't have"]
message: "Should handle gracefully when not found"
Step 9: Testing Different Tool Types
Database Tools
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: database-query
labels:
category: database
spec:
task_type: assistant
turns:
- role: user
content: "Find all users with role 'admin'"
assertions:
- type: tools_called
params:
tools: ["query_database"]
message: "Should query database"
- type: content_includes
params:
patterns: ["admin"]
message: "Should mention admin users"
API Integration
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: external-api-call
labels:
category: api
spec:
task_type: assistant
turns:
- role: user
content: "Get the current Bitcoin price"
assertions:
- type: tools_called
params:
tools: ["fetch_crypto_price"]
message: "Should call crypto API"
- type: content_includes
params:
patterns: ["Bitcoin"]
message: "Should mention Bitcoin"
File Operations
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: file-read-operation
labels:
category: filesystem
spec:
task_type: assistant
turns:
- role: user
content: "Read the contents of data.json"
assertions:
- type: tools_called
params:
tools: ["read_file"]
message: "Should call read_file"
Step 10: Advanced Tool Testing
Tool Call Chains
Test when one tool call leads to another:
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: tool-call-chain
labels:
category: chain
spec:
task_type: assistant
turns:
- role: user
content: "Find Alice's email and send her a welcome message"
assertions:
- type: tools_called
params:
tools: ["lookup_user", "send_email"]
message: "Should call both tools in sequence"
Parallel Tool Calls
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: parallel-tool-execution
labels:
category: parallel
spec:
task_type: assistant
turns:
- role: user
content: "Check the weather in New York, London, and Tokyo"
assertions:
- type: tools_called
params:
tools: ["get_weather"]
message: "Should call weather tool for multiple locations"
Debugging Tool Issues
Check Tool Configuration
# Inspect tool configuration
promptarena config-inspect --verbose
# Should show loaded tools
Verbose Tool Execution
# See detailed tool calls and responses
promptarena run --verbose --scenario tool-calling-test
# Output shows:
# [TOOL CALL] store_memory({"key": "favorite_color", "value": "blue"})
# [TOOL RESPONSE] {"success": true, "message": "Stored successfully"}
Debug MCP Server
# Test MCP server directly
echo '{"method": "tools/list"}' | mcp-memory-server
# Check server logs
export LOG_LEVEL=debug
promptarena run --scenario tool-test
Tool Testing Best Practices
1. Test Tool Selection
# Verify correct tool is chosen
assertions:
- type: tools_called
params:
tools: ["correct_tool_name"]
message: "Should call the right tool"
2. Validate Tool Calls
# Check that tools are called appropriately
assertions:
- type: tools_called
params:
tools: ["expected_tool"]
message: "Should use the expected tool"
3. Mock External Dependencies
# Use mock tools for external services
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Tool
metadata:
name: mock-external-api
spec:
name: external_api
description: "Mock external API"
mode: mock
mock_result:
status: "success"
4. Test Error Scenarios
apiVersion: promptkit.altairalabs.ai/v1alpha1
kind: Scenario
metadata:
name: tool-failure-handling
spec:
task_type: assistant
turns:
- role: user
content: "Do something that requires a tool"
assertions:
- type: content_includes
params:
patterns: ["error"]
message: "Should handle tool errors gracefully"
Common Issues
Tool Not Called
# Check if tools are enabled in prompt
cat prompts/assistant-with-tools.yaml | grep tools_enabled
# Should be: tools_enabled: true
Wrong Tool Arguments
# View actual tool calls
cat out/results.json | jq '.results[] | select(.tool_calls != null) | {
tool: .tool_calls[].name,
args: .tool_calls[].arguments
}'
MCP Server Connection Failed
# Verify MCP server is running
ps aux | grep mcp
# Test MCP server directly
mcp-memory-server --help
Next Steps
You now know how to test LLMs with tool calling!
Continue learning:
- Tutorial 5: CI Integration - Automate tool testing in CI/CD
- How-To: MCP Tools - Advanced tool configuration
- Runtime: Tools & MCP - Complete tool reference
Try this:
- Create custom MCP tools
- Test tool calling across multiple providers
- Build a tool call chain test
- Mock complex external APIs
What’s Next?
In Tutorial 5, you’ll learn how to integrate all these tests into your CI/CD pipeline for automated quality gates.