Test multi-agent A2A delegation patterns
This how-to walks through examples/multi-agent-demo/ — a supervisor assistant routes user requests to two specialised A2A agents (research, translation). The scenario asserts the delegation pattern: research queries hit the research agent, translation queries hit the translation agent, and both produce the expected content.
What it proves
Section titled “What it proves”Multi-agent systems fail in two distinct ways:
- Routing failure — the supervisor delegates to the wrong specialist (or no specialist at all).
- Specialist failure — the right specialist gets called but produces unusable output.
Pure single-agent eval misses the first; pure agent-output eval misses the supervisor’s contribution. PromptArena asserts both: which agent got invoked AND what the agent returned.
The pack registers two mock A2A agents (research_agent, translation_agent). The supervisor (a single assistant prompt) decides which to call based on the user’s request. Assertions check both the routing and the specialist response.
The assertion shape
Section titled “The assertion shape”examples/multi-agent-demo/scenarios/multi-agent-delegation.yaml:
turns: - role: user content: "Search for papers about quantum computing" assertions: - type: agent_invoked params: agents: - a2a__research_agent__search_papers - type: agent_response_contains params: agent: a2a__research_agent__search_papers contains: "Quantum Computing Fundamentals"
- role: user content: "Translate the summary to French" assertions: - type: agent_invoked params: agents: - a2a__translation_agent__translate - type: agent_response_contains params: agent: a2a__translation_agent__translate contains: "informatique quantique"
conversation_assertions: - type: agent_invoked params: agents: - a2a__research_agent__search_papers - a2a__translation_agent__translatePer-turn agent_invoked checks routing. Per-turn agent_response_contains checks specialist content. Conversation-level agent_invoked checks both agents were exercised across the run.
Mock A2A agents
Section titled “Mock A2A agents”The pack defines the agents inline with responses: blocks that match on input:
a2a_agents: - name: research_agent card: name: Research Agent description: A mock research agent that searches for academic papers skills: - id: search_papers name: Search Papers tags: [research, papers, search] responses: - skill: search_papers match: contains: quantum response: parts: - text: | Found 3 papers on quantum computing: 1. Quantum Computing Fundamentals (2024) ...Each response can match on input fragments, fall back to a default, or return structured parts. The mock A2A server runs in-process; no external server, no Docker, no provider keys.
Run it
Section titled “Run it”cd examples/multi-agent-demopromptarena serveHeadless / CI:
promptarena run --ci --formats html,jsonKeyless: both the supervisor (mock-assistant) and the A2A agents are mocked.
Switching to real A2A servers
Section titled “Switching to real A2A servers”To test against a real A2A server (running locally or remote):
- Drop the inline
a2a_agents:block. - Add
a2a_servers:pointing at the real endpoint URLs. - The supervisor and assertions stay the same — they don’t know whether the A2A endpoint is mocked or remote.
See Test A2A Agents for the full reference covering authentication, headers, and skill filtering.
Adding agents
Section titled “Adding agents”- Add a third specialist: drop another
a2a_agents:entry inconfig.arena.yamlwith its skills + canned responses. Add a turn to the scenario asserting on the new agent. - Stricter ordering:
tool_call_sequenceover thea2a__*tool names asserts the supervisor called the agents in a specific order. - Tool-arg assertions:
tool_calls_with_argslets you assert the supervisor passed the right arguments to each agent (e.g., the translation agent must receivetarget_language: French).
CI gate
Section titled “CI gate”# .github/workflows/multi-agent-demo.ymlname: A2A multi-agent
on: pull_request: paths: - 'examples/multi-agent-demo/**'
jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-go@v5 with: go-version: '1.26' - run: make build-arena - name: Run multi-agent demo working-directory: examples/multi-agent-demo run: ../../bin/promptarena run --ci --formats jsonKeyless and fork-safe. The mock A2A servers run in-process.