AlgoVeri: An Aligned Benchmark for Verified Code Generation on Classical Algorithms 文章

ArXiv CS.CL2026-06-04NEWSen作者: Haoyu Zhao, Ziran Yang, Jiawei Li, Deyuan He, Zenan Li, Chi Jin, Venugopal V. Veeravalli, Aarti Gupta, Sanjeev Arora

查看原文 →

关系图谱

摘要

arXiv:2602.09464v2 Announce Type: replace-cross Abstract: Vericoding refers to the generation of formally verified code from rigorous specifications. Recent AI models show promise in vericoding, but a unified methodology for cross-paradigm evaluation is lacking. Existing benchmarks test only individual languages/tools (e.g., Dafny, Verus, and Lean) and each covers very different tasks, so the performance numbers are not directly comparable. We address this gap with AlgoVeri, a benchmark that evaluates vericoding of $77$ classical algorithms in Dafny, Verus, and Lean. By enforcing identical functional contracts, AlgoVeri reveals critical capability gaps in verification systems. While frontier models achieve tractable success in Dafny ($40.3$% for Gemini-3 Flash), where high-level abstractions and SMT automation simplify the workflow, performance collapses under the systems-level memory constraints of Verus ($24.7$%) and the explicit proof construction required by Lean (7.8%).

AlgoVeri: An Aligned Benchmark for Verified Code Generation on Classical Algorithms 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (9)

相关技术查看全部 (5)