Risk-Controlled Lean-as-Judge for Natural-Language Mathematical Reasoning 事件

Name: Risk-Controlled Lean-as-Judge for Natural-Language Mathematical Reasoning
Start: 2026-05-28

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

Risk-Controlled Lean-as-Judge for Natural-Language Mathematical Reasoning arXiv:2605.28365v1 Announce Type: cross Abstract: Lean is increasingly used to judge natural-language mathematical answers, but its signal is partial: many answers never formalize, and a failed proof may reflect an ill-typed statement or a missing library fact, not a wrong answer. On MATH-500 we show this signal is (i) sharply coverage-dependent, that is the proof-winning answer is correct 96% of the time at high proved c

人工智能

关系图谱

Risk-Controlled Lean-as-Judge for Natural-Language Mathematical Reasoning 事件

相关公司查看全部 (7)

相关人物查看全部 (2)

相关产品查看全部 (10)

相关技术查看全部 (9)

相关报道查看全部 (1)