Stop Comparing LLM Agents Without Disclosing the Harness 事件

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

Stop Comparing LLM Agents Without Disclosing the Harness arXiv:2605.23950v1 Announce Type: new Abstract: This position paper argues that, for long-horizon tasks evaluated across models with comparable frontier capability, the agent execution harness, namely the infrastructure layer that governs context construction, tool interaction, orchestration, and verification around a language model, is often a stronger determinant of agent performance than the model it wraps. We formalize and defend the