Skip to content

Generate Mock Responses from Arena Results

Convert recorded Arena JSON results into mock provider YAML and replay conversations without calling an external LLM. This is ideal for tightening CI feedback loops and keeping IoT maintenance demos deterministic.

  • Arena run outputs in out/*.json (generated by promptarena run ... --format json)
  • Go toolchain installed (the CLI builds on demand)
  • A workspace where you want to store mock responses (e.g., providers/mock-generated.yaml or per-scenario files)
  1. Run Arena and capture JSON results

    Terminal window
    promptarena run \
    --scenario hardware-faults \
    --provider openai-gpt4o \
    --format json \
    --out out
  2. Generate mocks from the recorded runs

    Terminal window
    promptarena mocks generate \
    --input out \
    --scenario hardware-faults \
    --provider openai-gpt4o \
    --output providers/mock-generated.yaml \
    --merge
    • --input can be a single run file or a directory of JSON results.
    • --scenario / --provider filter which runs are included.
    • --merge overlays onto an existing mock file instead of overwriting.
  3. (Optional) Split per scenario

    Terminal window
    promptarena mocks generate \
    --input out \
    --per-scenario \
    --output providers/responses \
    --merge

    This writes one YAML per scenario under providers/responses/.

  4. (Optional) Preview without writing

    Terminal window
    promptarena mocks generate --input out --dry-run

    Prints the generated YAML to stdout.

Using the hardware-faults run artifacts in tools/arena/templates/testdata:

Terminal window
promptarena mocks generate \
--input tools/arena/templates/testdata \
--scenario hardware-faults \
--output iot-maintenance-demo/providers/responses/mock-assistant.yaml \
--merge

This refreshes the IoT maintenance demo mocks with real tool calls and responses captured from a prior OpenAI run.

  • Add a --default-response if you want a fallback when no turn-specific response exists.
  • Keep recorded JSON fixtures under version control (tools/arena/templates/testdata/) so tests stay deterministic.
  • After generating mocks, run your flows with the mock provider to validate determinism before committing.