TheoremBench: Evaluating LLMs on Theorem Proving in Formal Mathematics 事件

Name: TheoremBench: Evaluating LLMs on Theorem Proving in Formal Mathematics
Start: 2026-06-09

PRODUCT_LAUNCH2026-06-09影响: MEDIUM

TheoremBench: Evaluating LLMs on Theorem Proving in Formal Mathematics arXiv:2606.09450v1 Announce Type: new Abstract: LLMs have recently achieved strong results on formal proving benchmarks. However, existing evaluations remain heavily concentrated on competition-style problems and often fail to capture how models behave on longer, more dependency-rich mathematical developments. We introduce TheoremBench, a Lean4 benchmark designed to evaluate theorem provers beyond contest settings. The bench

人工智能

关系图谱

TheoremBench: Evaluating LLMs on Theorem Proving in Formal Mathematics 事件

相关公司查看全部 (8)

相关人物查看全部 (4)

相关产品查看全部 (10)

相关技术查看全部 (10)

相关报道查看全部 (1)