Test agent negotiation with scripted or self-play opponents
This how-to walks through examples/text-negotiation/ — a four-turn rental-price negotiation. The default config runs deterministically against a mock landlord; the how-to documents the swap to real LLM-driven self-play.
What it proves
Section titled “What it proves”Single-turn eval misses negotiation entirely: the failure mode is trajectory, not response quality. Did the agent cave under pressure? Did it secure the term commitment? Did it stop above its reservation price?
PromptArena tests negotiations as conversations:
- Both sides are first-class. The tenant side is either scripted (deterministic CI) or
selfplay-userwith a persona (real LLM-driven). The landlord side is your prompt config. - Multi-turn is the default. Scenarios encode a sequence of user turns; conversation_assertions evaluate the end state.
- Outcome assertions catch what matters: did the deal land above the reservation, did the term commitment get secured, did either side make a forbidden capitulation move.
Run it
Section titled “Run it”cd examples/text-negotiationpromptarena serveserve loads the scenario. The TUI is good for the dev loop:
promptarena run --tuiHeadless / CI:
promptarena run --ci --formats html,jsonopen out/report.htmlKeyless: the default config uses a mock landlord with scripted responses.
The assertion shape
Section titled “The assertion shape”conversation_assertions: - type: content_includes params: patterns: ["twenty-four fifty"] message: "Final deal must land at the negotiated price"
- type: content_includes params: patterns: ["twelve-month"] message: "Final deal must include the 12-month commitment"
- type: content_excludes params: patterns: ["I accept twenty-three hundred", "twenty-three hundred works"] message: "Landlord must not capitulate on the below-reservation offer"All three at conversation level — they check the end state, not per-turn behaviour. Negotiations succeed or fail by the deal that lands; per-turn checks tend to over-constrain the agent.
For stricter contracts, layer in llm_judge_session with criteria over the full conversation:
- type: llm_judge_session params: criteria: | Did the landlord maintain a professional negotiation posture across all turns, avoiding capitulation while reaching an acceptable deal? Score 1.0 if yes. judge: default min_score: 0.8(That assertion needs a judge provider — see the assertion catalog for the standard judge params.)
Switching to real self-play
Section titled “Switching to real self-play”The default scenario uses scripted user turns. To run with a real LLM driving the tenant side:
-
Add a persona file:
# personas/savvy-tenant.persona.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Personametadata:name: savvy-tenantspec:id: savvy-tenantdescription: "Cost-conscious tenant negotiating rent."system_template: |You are a tenant negotiating monthly rent. Target $2300/monthwith a 12-month lease. Be polite but firm; don't accept above$2500. If the landlord won't budge below $2500, walk away. -
Add a real text-LLM provider:
# providers/openai-gpt4o-mini-text.provider.yamlapiVersion: promptkit.altairalabs.ai/v1alpha1kind: Providermetadata:name: openai-gpt4o-mini-textspec:id: openai-gpt4o-mini-texttype: openaimodel: gpt-4o-mini -
Replace the scripted user turns with a
selfplay-userblock:turns:- role: selfplay-userpersona: savvy-tenantturns: 4 -
Run with
OPENAI_API_KEYin your environment.
The assertions stay the same. The persona-driver LLM generates new content per turn; the landlord responds; the conversation evolves dynamically. The report shows each persona-driven turn — useful for spotting cases where the persona behaves unexpectedly.
CI gate
Section titled “CI gate”# .github/workflows/text-negotiation.ymlname: Text negotiation
on: pull_request: paths: - 'examples/text-negotiation/**'
jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-go@v5 with: go-version: '1.26' - run: make build-arena - name: Run text-negotiation working-directory: examples/text-negotiation run: ../../bin/promptarena run --ci --formats jsonKeyless and fork-safe with the default scripted config.
Extending it
Section titled “Extending it”- Reservation-price assertions: instead of asserting on specific phrases, parse the final agreed amount and assert numerically.
tool_calls_with_argsworks well if the agent ends the negotiation with arecord_agreement(amount: ...)tool call. - Walk-away scenarios: scripted tenant offers $2000; assert the landlord ends the conversation.
content_includeswith words like “won’t be able to make this work” / “not the right fit.” - Adversarial tenants: persona variants — aggressive, indecisive, deadline-pressured. Same landlord, different self-play personas, different outcome profiles.