TheoremBench: Evaluating LLMs on Theorem Proving in Formal Mathematics 事件
PRODUCT_LAUNCH2026-06-09影响: MEDIUM
TheoremBench: Evaluating LLMs on Theorem Proving in Formal Mathematics arXiv:2606.09450v1 Announce Type: new Abstract: LLMs have recently achieved strong results on formal proving benchmarks. However, existing evaluations remain heavily concentrated on competition-style problems and often fail to capture how models behave on longer, more dependency-rich mathematical developments. We introduce TheoremBench, a Lean4 benchmark designed to evaluate theorem provers beyond contest settings. The bench
相关产品查看全部 (10)
相关报道查看全部 (1)
TheoremBench: Evaluating LLMs on Theorem Proving in Formal Mathematics
ArXiv CS.AI2026-06-09