Beyond the Frontier: Stochastic Backtracking for Efficient Test-Time Scaling 文章

ArXiv CS.AI2026-05-26NEWSen作者: Dao Tran, Duc Anh Le, Ngoc Luu, Quan Pham, Tung Pham, Hung Bui

摘要

arXiv:2605.25143v1 Announce Type: new Abstract: Test-time scaling improves language model reasoning by spending additional compute to explore multiple solution trajectories. The key challenge is to maximize accuracy while minimizing the total number of generated tokens during reasoning. Recent PRM-guided methods score intermediate prefixes to steer this search, but most are frontier-only: they keep only the current active prefixes and irreversibly prune or resample away the rest using noisy PRM scores. This can cause premature commitment, diversity collapse, and the loss of prefixes that still admit correct continuations. We introduce stochastic backtracking over a persistent pool of historical prefixes, allowing test-time compute to revisit previously generated states instead of only expanding the current frontier. To make this efficient, we propose two complementary mechanisms.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据