FVSpec: Real-World Property-Based Tests as Lean Challenges 文章

ArXiv CS.AI2026-06-02NEWSen作者: Quinn Dougherty, Max von Hippel, Hazel Shackleton, Mike Dodds

摘要

arXiv:2606.01008v1 Announce Type: cross Abstract: We present a benchmark for evaluating AI models and agents on real-world formal software verification tasks. We first scrape 11,039 property-based tests (PBTs) from real-world Python repositories, then automatically translate 2,772 of them (25%) into 9,415 Lean 4 specifications with sorry placeholders (about 3 formalizations/PBT; we retain multiple attempts when none dominates on quality metrics). Translating PBTs into Lean specifications is challenging: it requires modeling Python semantics in Lean, inferring the logical property encoded in an imperative PBT, and handling the inherent difficulties of dependently-typed programming in a seldom-used language. We describe a three-agent LLM pipeline for transpiling PBTs into Lean specifications, evaluate coverage and quality metrics, and provide baselines for proof generation using several automated and model based approaches.

相关事件查看全部 (1)

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据