Skip to content

Arena How-To

Practical guides for accomplishing specific tasks with PromptArena.

### [Install PromptArena](/arena/how-to/installation/) Set up PromptArena on your system and verify the installation.
### [Configure Shell Completions](/arena/how-to/shell-completions/) Enable tab completion for commands, flags, and dynamic values like scenarios and providers.
### [Use Project Templates](/arena/how-to/use-project-templates/) Quickly scaffold new test projects with the `promptarena init` command. Includes 6 built-in templates for common use cases like customer support, code generation, content creation, multimodal AI, and MCP integration.
### [Write Test Scenarios](/arena/how-to/write-scenarios/) Create and structure test scenarios for LLM testing with the PromptPack format.
### [Configure LLM Providers](/arena/how-to/configure-providers/) Set up and manage connections to OpenAI, Anthropic, Google, and other LLM providers.
### [Use Mock Providers](/arena/how-to/use-mock-providers/) Test quickly and cost-free with mock providers instead of real LLM APIs.
### [Validate Outputs](/arena/how-to/validate-outputs/) Use assertions and custom validators to verify LLM response quality.
### [Use guardrails as test signals (the three-role model)](/arena/how-to/guardrails-as-signals/) Walk through `examples/guardrails-test/`. One primitive enforces in production AND fires as an observable test signal. The canonical demo of the eval / guardrail / assertion bridge.
### [Run workflow scenarios as a regression suite](/arena/how-to/workflow-regression/) Walk through `examples/workflow-support/` and `examples/workflow-order-processing/`. State machines as first-class test subjects: drive the agent through the lifecycle, assert the transitions, gate merges on the workflow reaching the expected end state.
### [Generate Mock Responses from Arena Results](/arena/how-to/generate-mock-responses-from-arena/) Turn recorded Arena runs into mock provider YAML for deterministic, cost-free replays.
### [Gate model migrations on a regression suite](/arena/how-to/model-migration/) Walk through `examples/model-migration/`: run the same scenarios against the old and new model side by side. CI exits non-zero if the new model breaks an assertion the old one passed.
### [Set Up Voice Testing with Self-Play](/arena/how-to/setup-voice-testing/) Configure automated voice testing using duplex streaming and self-play with TTS.
### [Test a Voice Customer Support Agent](/arena/how-to/voice-customer-support/) Walk through `examples/voice-refund-demo/`: four scripted personas (hostile, impersonator, anxious, patient) driving a refund agent under voice, with conversation-level assertions on the tools the agent must (and must not) call.
### [Test a Voice IVR with a Workflow State Machine](/arena/how-to/voice-ivr/) Walk through `examples/voice-ivr/`: a workflow-driven bank IVR that routes callers via state transitions to self-service or human handoff. Pairs the workflow primitive with the voice harness and asserts the tool-call pattern of each path.
### [Assert per-turn latency budgets](/arena/how-to/voice-latency-budget/) Walk through `examples/voice-latency-budget/`: gate every turn against a `max_ms` budget. Arena bridges the assistant message's `LatencyMs` into eval context, so `latency_budget` reads real provider timing with no custom plumbing.
### [Test voice agents that call tools mid-conversation](/arena/how-to/voice-tool-calls/) Walk through `examples/duplex-streaming/scenarios/duplex-tools.scenario.yaml`: a busy-professional persona drives a voice agent through weather / calendar / reminder tool calls. Conversation-level assertions catch the tool-call pattern without leaving the audio pipeline.
### [Red-team a voice agent with safety guardrails](/arena/how-to/voice-red-team/) Walk through `examples/voice-red-team/`: `pii_leakage` wired as a guardrail in the pack, scenarios assert the firing via `guardrail_triggered`. Same primitive enforces in production AND fires as a test signal — the three-role pattern (eval / guardrail / assertion) end-to-end.
### [PII-redaction guardrails for voice agents](/arena/how-to/voice-guardrails/) Walk through `examples/voice-guardrails/`: a focused single-scenario demo of the runtime + test bridge. The `pii_leakage` guardrail replaces the agent's would-be-spoken PII before reaching TTS; the test reads the firing from `validations:` on the recorded message via `guardrail_triggered`.
### [Run the same scenario across multiple providers](/arena/how-to/voice-bake-off/) Walk through `examples/voice-bake-off/`: one scenario, two providers, side-by-side report. Adding a provider is one YAML line; per-provider thresholds use `when:` clauses. The fan-out shape stays the same whether you're comparing mocks or real duplex providers.
### [Test expressive voice personas with characterization tags](/arena/how-to/voice-characterization/) Walk through the expressive path in `examples/voice-refund-demo/`. Personas opt in with `expressive: true` and emit canonical bracket tags (`[shouts]`, `[whispers]`, `[laughs]`); each TTS provider adapter lowers them into its native dialect (ElevenLabs v3 native, OpenAI instructions, Cartesia emotion, SSML).
### [Session Recording](/arena/how-to/session-recording/) Capture detailed session recordings for debugging, replay, and analysis. Export audio tracks, correlate events with annotations, and use recordings for deterministic test replay.
### [Manage Context](/arena/how-to/manage-context/) Configure context management and truncation strategies for long conversations, including embedding-based relevance truncation.
### [Test MCP Tools](/arena/how-to/test-mcp-tools/) Configure MCP servers in Arena for integration testing with tool filtering, timeouts, and environment variables.
### [Test A2A Agents](/arena/how-to/test-a2a-agents/) Test agent-to-agent delegation with mock or remote A2A agents, including authentication, headers, and skill filtering.
### [Test agent negotiation with scripted or self-play opponents](/arena/how-to/text-negotiation/) Walk through `examples/text-negotiation/`: a four-turn rental-price negotiation with conversation-outcome assertions. Default runs deterministically against a mock landlord; the how-to documents the swap to real LLM-driven self-play via `role: selfplay-user` and a persona.
### [Integrate with CI/CD](/arena/how-to/integrate-ci-cd/) Automate LLM testing in GitHub Actions, GitLab CI, Jenkins, and other pipelines.
### [Run Arena as a CI quality gate](/arena/how-to/arena-ci-quality-gate/) Wire `promptarena run --ci` into GitHub Actions as a hard merge gate. Fork-safe defaults, real-provider keys via secrets, threshold-based pass/fail, report uploads for reviewers.

How-to guides are goal-oriented recipes that show you how to solve specific problems:

  • ✅ “How do I install Arena?”
  • ✅ “How do I configure multiple providers?”
  • ✅ “How do I integrate with GitHub Actions?”

Looking for something else?

  • Tutorials - Step-by-step learning paths for beginners
  • Explanation - Understand concepts and design decisions
  • Reference - Complete technical specifications and API docs