Reliable Self-Improvement Training by Verifying Reasoning, Not Just Answers 文章

ArXiv CS.AI2026-06-01NEWSen作者: Xinyu Zhang

详细信息

来源站点: ArXiv CS.AI
作者: Xinyu Zhang
文章类型: NEWS
语言: en
发布日期: 2026-06-01

摘要

arXiv:2603.21558v2 Announce Type: replace Abstract: Self-improvement training, where models learn from self-generated solutions, promises sustained capability gains but suffers from a pervasive failure mode: across multiple rounds, compounding reasoning errors cause accuracy to stall or degrade. We trace this drift to standard filtering criteria that retain solutions based solely on final answer correctness, which lets lucky guesses (correct answers with flawed reasoning) contaminate the training data. We propose Verified Self-Improvement (VSI), a framework that conditions data retention on step-level structural integrity rather than just the final output. VSI validates solutions by recomputing arithmetic steps via a computer-algebra library (sympy), checking intermediate consistency, and enforcing domain constraints.

Reliable Self-Improvement Training by Verifying Reasoning, Not Just Answers 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (1)

相关技术查看全部 (3)