BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing? 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing? arXiv:2603.03194v2 Announce Type: replace Abstract: Current code-agent benchmarks primarily evaluate localized issue resolution within a single target repository, leaving under-tested many software engineering tasks that require external knowledge or broader repository-level changes. We introduce BeyondSWE, a 500-instance benchmark drawn from 246 real-world GitHub repositories to evaluate code agents beyond single-reposito

BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing? · 相关技术