TheoremBench: Evaluating LLMs on Theorem Proving in Formal Mathematics 事件

PRODUCT_LAUNCH2026-06-09影响: MEDIUM

TheoremBench: Evaluating LLMs on Theorem Proving in Formal Mathematics arXiv:2606.09450v1 Announce Type: new Abstract: LLMs have recently achieved strong results on formal proving benchmarks. However, existing evaluations remain heavily concentrated on competition-style problems and often fail to capture how models behave on longer, more dependency-rich mathematical developments. We introduce TheoremBench, a Lean4 benchmark designed to evaluate theorem provers beyond contest settings. The bench