Stop Comparing LLM Agents Without Disclosing the Harness 文章

ArXiv CS.AI2026-05-26NEWSen作者: Yunbei Zhang, Janet Wang, Yingqiang Ge, Weijie Xu, Jihun Hamm, Chandan K. Reddy

摘要

arXiv:2605.23950v1 Announce Type: new Abstract: This position paper argues that, for long-horizon tasks evaluated across models with comparable frontier capability, the agent execution harness, namely the infrastructure layer that governs context construction, tool interaction, orchestration, and verification around a language model, is often a stronger determinant of agent performance than the model it wraps. We formalize and defend the Binding Constraint Thesis: in this regime, performance variance is governed more by harness configuration than by model choice, and current evaluation protocols therefore systematically misattribute harness-level gains to model improvements. We support this thesis along three lines.

Stop Comparing LLM Agents Without Disclosing the Harness 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (4)

相关人物

相关产品查看全部 (8)

相关技术查看全部 (23)