Beyond Pass Rate: A Multilingual, Execution-Grounded Evaluation of Open Code LLMs 事件

PRODUCT_LAUNCH2026-06-09影响: MEDIUM

Beyond Pass Rate: A Multilingual, Execution-Grounded Evaluation of Open Code LLMs arXiv:2606.08840v1 Announce Type: new Abstract: Code generation models are typically compared using compact execution benchmarks and aggregate pass rates, but such summaries obscure how performance varies across programming languages, problem families, and failure modes. We present a large-scale, execution-grounded evaluation of 9 openly accessible LLMs specialized for coding on 2,707 free LeetCode problems across

Beyond Pass Rate: A Multilingual, Execution-Grounded Evaluation of Open Code LLMs · 相关技术