FVSpec: Real-World Property-Based Tests as Lean Challenges 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

FVSpec: Real-World Property-Based Tests as Lean Challenges arXiv:2606.01008v1 Announce Type: cross Abstract: We present a benchmark for evaluating AI models and agents on real-world formal software verification tasks. We first scrape 11,039 property-based tests (PBTs) from real-world Python repositories, then automatically translate 2,772 of them (25%) into 9,415 Lean 4 specifications with sorry placeholders (about 3 formalizations/PBT; we retain multiple attempts when none dominates on quality