Tool Authoring
How to write Tool YAMLs for PromptArena: which mode to use, and how to wire each one.
For the full Tool schema (every field), see Configuration Schema → Tool or schemas/v1alpha1/tool.json in the repo.
Modes at a glance
Section titled “Modes at a glance”mode | Use when | Required fields |
|---|---|---|
mock (static) | Response is the same regardless of args | mock_result |
mock (template) | Response depends on args (e.g. branch on order_id) | mock_template or mock_template_file |
live | Tool calls a real HTTP endpoint | http: |
mcp | Tool is exposed by an MCP server already configured at the arena level | (none — auto-discovered) |
exec | Tool shells out to a local subprocess | exec: |
client | Tool is handled by client code (SDK consumer or external runtime) | client: |
mock_result and mock_template are mutually exclusive on a single tool.
Mock — static (mock_result)
Section titled “Mock — static (mock_result)”Returns the same response for every call. Use this when the value is a constant or when no test case in your suite needs to differentiate.
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Toolmetadata: name: get-weatherspec: name: get_weather description: Get the current weather for a city. mode: mock timeout_ms: 1000 input_schema: type: object properties: city: { type: string } required: [city] output_schema: type: object properties: temperature_c: { type: number } conditions: { type: string } mock_result: temperature_c: 18 conditions: cloudyMock — input-branching (mock_template)
Section titled “Mock — input-branching (mock_template)”Returns a different response based on tool-call args. Args are parsed as a JSON map and exposed as the template’s data context. The rendered output is parsed back as JSON.
This is the right answer when:
- One scenario should look up a real order, another should fail to find it
- A “happy path” persona needs
in_warranty: truewhile a “hostile” persona needsfalse - You want to keep all branching logic in YAML rather than writing code
Do not write a custom executor for this case — the template executor already exists.
apiVersion: promptkit.altairalabs.ai/v1alpha1kind: Toolmetadata: name: lookup-orderspec: name: lookup_order description: Look up an order by ID. mode: mock timeout_ms: 1000 input_schema: type: object properties: order_id: { type: string } required: [order_id] output_schema: type: object properties: order_id: { type: string } in_warranty: { type: boolean } mock_template: | {{- if eq .order_id "ORD-2024-9999" -}} {"order_id":"ORD-2024-9999","in_warranty":true} {{- else if eq .order_id "ORD-2023-7788" -}} {"order_id":"ORD-2023-7788","in_warranty":false} {{- else -}} {"error":"not_found"} {{- end -}}Template language
Section titled “Template language”mock_template is rendered with Go’s text/template (Option("missingkey=zero")). The args map is the data context, so .order_id accesses the order_id field of the call.
Supported control flow includes:
{{ if eq .field "value" }}…{{ else if … }}…{{ else }}…{{ end }}{{ range .items }}…{{ end }}- Comparison helpers:
eq,ne,lt,gt,le,ge printf,index, and the rest of the standard template functions
The {{- … -}} form trims surrounding whitespace, which is what you want when the rendered output must parse as JSON.
Long templates: mock_template_file
Section titled “Long templates: mock_template_file”For templates that don’t fit comfortably inline, point at a file (path is relative to the tool YAML):
spec: mode: mock mock_template_file: templates/lookup-order.tmplMultimodal mocks: mock_parts
Section titled “Multimodal mocks: mock_parts”For tools that should return image/audio/video/document content alongside JSON, add mock_parts (works with both mock_result and mock_template). See Configuration Schema → Tool for the full structure.
Live — HTTP (mode: live)
Section titled “Live — HTTP (mode: live)”Calls a real HTTP endpoint. Args are sent as JSON; response is the parsed JSON body.
spec: mode: live http: url: https://api.example.com/orders/lookup method: POST headers: Content-Type: "application/json" headers_from_env: - API_TOKEN # → "Authorization: Bearer ${API_TOKEN}" timeout_ms: 5000 redact: # fields stripped from logs - api_keyMCP — discovered tools (mode: mcp)
Section titled “MCP — discovered tools (mode: mcp)”The tool is provided by an MCP server configured at the arena level. The arena auto-discovers tools from configured servers; the Tool YAML just declares the contract.
spec: mode: mcp # No additional config — the MCP client provides the executor.Exec — subprocess (mode: exec)
Section titled “Exec — subprocess (mode: exec)”Calls a local subprocess; args are sent on stdin, response is read from stdout.
spec: mode: exec exec: command: ./bin/lookup-order args: ["--format=json"] timeout_ms: 5000Client — handled outside the runtime (mode: client)
Section titled “Client — handled outside the runtime (mode: client)”The runtime hands the tool call back to the SDK consumer (or an external system) for execution. Used when the executor lives outside the test harness — e.g. a real backend you want the LLM to call, but where Arena should not own the implementation.
spec: mode: client client: timeout_ms: 5000 categories: [filesystem] consent: required: true message: "Allow the agent to read your filesystem?" decline_strategy: error validate_output: trueChoosing a mode
Section titled “Choosing a mode”Need a deterministic test fixture?├─ Same response every call → mock + mock_result└─ Response should depend on args → mock + mock_template
Want the LLM to hit a real system?├─ HTTP API I control → live + http├─ Tool provided by an MCP server → mcp├─ Local CLI / script → exec└─ Caller (SDK / app) handles it → clientSee also
Section titled “See also”- Configuration Schema → Tool — full field reference
examples/voice-refund-demo—mock_templatebranching across three personasexamples/customer-support—mode: mockwith static results