Run workflow scenarios as a regression suite
This how-to walks through examples/workflow-support/ and examples/workflow-order-processing/ — two scenarios that use PromptKit’s workflow primitive (state machine in the pack, workflow__transition tool the agent calls). Scenarios assert that the agent drives the expected lifecycle.
What it proves
Section titled “What it proves”Workflow-driven agents fail in ways pure single-turn eval can’t catch: the agent reaches the wrong terminal state, skips a required transition, or loops forever. Eval-only frameworks don’t have a workflow concept at all — every demo of “agent state” ends up being a custom log scraper. PromptArena ships the state machine as a pack primitive and assertions can observe the transitions.
Two examples ship with the runtime:
examples/workflow-support/— three-state support workflow (intake → specialist → closed) with two scenarios covering different paths.examples/workflow-order-processing/— four-state order lifecycle (new_order → payment → fulfillment → complete) with full and partial-flow scenarios.
The assertion shape
Section titled “The assertion shape”Workflow scenarios drive a stateful agent across multiple turns. The agent calls workflow__transition to advance the state machine; the assertions observe that those calls happened.
Working pattern (used in both examples after this PR):
conversation_assertions: - type: tools_called params: tool_names: [workflow__transition] min_calls: 3 message: "Agent must drive three workflow transitions across the lifecycle"
- type: tool_call_sequence params: sequence: - workflow__transition - workflow__transition - workflow__transition message: "Workflow transitions must occur in order"tools_called checks the agent called the transition tool the expected number of times. tool_call_sequence enforces ordering. Both pass deterministically against the mock provider’s scripted responses.
Known limitation: state-aware assertions
Section titled “Known limitation: state-aware assertions”PromptKit ships richer state-aware assertions (state_is, transitioned_to, workflow_complete, workflow_transition_order) that operate on the workflow metadata directly. These are currently affected by a bridge bug (#1169) — the workflow metadata isn’t reaching the eval context, so the assertions report “no workflow state in context.”
Both examples used to assert via those handlers and now use the tool-call shape above as a workaround. The state machine still runs (you can see the transitions in the log and the HTML report timeline); only the assertion-side bridge needs fixing. Once #1169 lands, the scenarios can move back to the state-aware assertion shape — the scenarios become tighter (asserting on actual state, not the tool call that drives it) but the runtime behaviour stays the same.
Running
Section titled “Running”cd examples/workflow-support../../bin/promptarena run --ci --formats html,jsonopen out/report.htmlThe HTML report’s timeline view shows each transition with its from/to states and the event that triggered it. Even with the state-aware assertions disabled, the transition history is fully observable in the report.
Same shape for the order-processing example:
cd examples/workflow-order-processing../../bin/promptarena run --ci --formats html,jsonWhat’s a regression suite, then?
Section titled “What’s a regression suite, then?”Two concrete shapes:
- Did the agent drive the lifecycle? —
tools_calledwithmin_callsmatching the expected number of transitions. Catches “agent stopped triggering the state machine” regressions. - In the right order? —
tool_call_sequencematching the expected lifecycle order. Catches “agent skipped a state” or “agent went backwards” regressions.
Combine with content checks (content_includes, content_excludes) on each turn for “agent confirmed payment received” / “agent provided tracking number” assertions, and the suite gives a deterministic deploy gate for any prompt change that might break the workflow.
CI gate
Section titled “CI gate”# .github/workflows/workflow-regression.ymlname: Workflow regression
on: pull_request: paths: - 'examples/workflow-support/**' - 'examples/workflow-order-processing/**'
jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-go@v5 with: go-version: '1.26' - run: make build-arena - name: Run workflow scenarios run: | for dir in examples/workflow-support examples/workflow-order-processing; do (cd "$dir" && ../../bin/promptarena run --ci --formats json) || exit 1 doneKeyless: both examples use mock providers with scripted workflow__transition calls.
Extending it
Section titled “Extending it”- Add a new state: drop it in the pack’s
workflow:block withon_event:mappings. Add a scenario that exercises the new path. Assertion shape stays the same — extendmin_callsand thesequenceto match. - Test state isolation:
workflow_tool_access(once #1169 is fixed) lets you assert that the agent only uses certain tools in certain states. Today, fall back to combiningtools_calledper-turn withwhen:clauses. - Multi-path scenarios: register multiple scenarios that exercise different lifecycle branches (e.g., happy-path, escalation-path, error-path). The same
tool_call_sequenceshape works per scenario.
Related how-tos
Section titled “Related how-tos”- Voice IVR with workflow state machine — same workflow primitive paired with the voice harness. Uses the same
tools_called-based assertion pattern. - The voice-ivr scenario is the reference for the workflow + voice pairing; the workflow-support / workflow-order-processing scenarios are the reference for the text-only workflow lifecycle pattern.