PieArena: Ranking and Profiling Language Agents in Realistic Negotiation Scenarios 文章

ArXiv CS.AI2026-06-03NEWSen作者: Chris Zhu, Sasha Cui, Will Sanok Dufallo, Runzhi Jin, Zhen Xu, Linjun Zhang, Daylian Cain

摘要

arXiv:2602.05302v3 Announce Type: replace Abstract: We present an in-depth evaluation of LLMs' ability to negotiate, a central business task requiring strategic reasoning, theory of mind, and economic value creation. To do so, we introduce PieArena, a large-scale negotiation benchmark grounded in multi-agent interactions over realistic scenarios adapted from MBA negotiation courses at an elite business school. We evaluate language agents across three pairing regimes: mirror-play, cross-play, and human-LM play. We develop a ranking model for continuous negotiation payoffs that yields order-invariant, uncertainty-quantified leaderboards while correcting for systematic experimental asymmetries. We further study the effects of joint-intentionality agentic scaffolding and find asymmetric gains, with large improvements for mid- and lower-tier LMs and diminishing returns for frontier LMs.