Controllable and Verifiable Process Data Synthesis for Process Reward Models 文章

ArXiv CS.AI2026-06-06NEWSen作者: Yinghui Chi, Lucien Wang

详细信息

来源站点
ArXiv CS.AI
作者
Yinghui Chi, Lucien Wang
文章类型
NEWS
语言
en
发布日期
2026-06-06

摘要

arXiv:2605.02395v2 Announce Type: replace Abstract: Process reward models (PRMs) rely on high-quality process supervision data, yet existing construction methods often provide limited control over error location, error type, and trajectory consistency. We propose a controllable and verifiable framework for synthesizing process supervision data for PRMs. Our framework first constructs a correct symbolic reasoning chain, injects a template-aware error into an intermediate step, recomputes subsequent steps under the corrupted state, and verifies that the injected step is not derivable from its prefix. The resulting paired trajectories are prefix-invalid at the first error while remaining trajectory-consistent after symbolic recomputation, and are translated into aligned natural-language processes for PRM training and evaluation. Experiments show that the synthesized data improve Best-of-8 reranking on logical reasoning benchmarks and transfer to mathematical reasoning.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据