Reliable Self-Improvement Training by Verifying Reasoning, Not Just Answers 文章

ArXiv CS.AI2026-06-01NEWSen作者: Xinyu Zhang

详细信息

来源站点
ArXiv CS.AI
作者
Xinyu Zhang
文章类型
NEWS
语言
en
发布日期
2026-06-01

摘要

arXiv:2603.21558v2 Announce Type: replace Abstract: Self-improvement training, where models learn from self-generated solutions, promises sustained capability gains but suffers from a pervasive failure mode: across multiple rounds, compounding reasoning errors cause accuracy to stall or degrade. We trace this drift to standard filtering criteria that retain solutions based solely on final answer correctness, which lets lucky guesses (correct answers with flawed reasoning) contaminate the training data. We propose Verified Self-Improvement (VSI), a framework that conditions data retention on step-level structural integrity rather than just the final output. VSI validates solutions by recomputing arithmetic steps via a computer-algebra library (sympy), checking intermediate consistency, and enforcing domain constraints.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据