Controllable and Verifiable Process Data Synthesis for Process Reward Models 文章

ArXiv CS.AI2026-06-06NEWSen作者: Yinghui Chi, Lucien Wang

详细信息

来源站点: ArXiv CS.AI
作者: Yinghui Chi, Lucien Wang
文章类型: NEWS
语言: en
发布日期: 2026-06-06

摘要

arXiv:2605.02395v2 Announce Type: replace Abstract: Process reward models (PRMs) rely on high-quality process supervision data, yet existing construction methods often provide limited control over error location, error type, and trajectory consistency. We propose a controllable and verifiable framework for synthesizing process supervision data for PRMs. Our framework first constructs a correct symbolic reasoning chain, injects a template-aware error into an intermediate step, recomputes subsequent steps under the corrupted state, and verifies that the injected step is not derivable from its prefix. The resulting paired trajectories are prefix-invalid at the first error while remaining trajectory-consistent after symbolic recomputation, and are translated into aligned natural-language processes for PRM training and evaluation. Experiments show that the synthesized data improve Best-of-8 reranking on logical reasoning benchmarks and transfer to mathematical reasoning.

Controllable and Verifiable Process Data Synthesis for Process Reward Models 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (1)