FVSpec: Real-World Property-Based Tests as Lean Challenges 文章

ArXiv CS.AI2026-06-02NEWSen作者: Quinn Dougherty, Max von Hippel, Hazel Shackleton, Mike Dodds

详细信息

来源站点: ArXiv CS.AI
作者: Quinn Dougherty, Max von Hippel, Hazel Shackleton, Mike Dodds
文章类型: NEWS
语言: en
发布日期: 2026-06-02

摘要

arXiv:2606.01008v1 Announce Type: cross Abstract: We present a benchmark for evaluating AI models and agents on real-world formal software verification tasks. We first scrape 11,039 property-based tests (PBTs) from real-world Python repositories, then automatically translate 2,772 of them (25%) into 9,415 Lean 4 specifications with sorry placeholders (about 3 formalizations/PBT; we retain multiple attempts when none dominates on quality metrics). Translating PBTs into Lean specifications is challenging: it requires modeling Python semantics in Lean, inferring the logical property encoded in an imperative PBT, and handling the inherent difficulties of dependently-typed programming in a seldom-used language. We describe a three-agent LLM pipeline for transpiling PBTs into Lean specifications, evaluate coverage and quality metrics, and provide baselines for proof generation using several automated and model based approaches.

FVSpec: Real-World Property-Based Tests as Lean Challenges 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (3)