Reasoning with Sampling: Cutting at Decision Points 文章

ArXiv CS.CL2026-05-29NEWSen作者: Felix Zhou, Anay Mehrotra, Quanquan C. Liu

摘要

arXiv:2605.30327v1 Announce Type: cross Abstract: Frontier reasoning models are produced by posttraining base language models with reinforcement learning. Recent work has challenged this by showing that sampling from a sharpened version of the base model's distribution, a so-called power distribution, elicits comparable reasoning without additional training, curated datasets, or verifiers. However, making this method practical requires efficiently sampling from the power distribution. A sampler needs to "mix" to the power distribution, which necessitates moving between modes of the target distribution; intuitively, e.g., trying different reasoning strategies. The samplers proposed in prior works repeatedly select a "cut" position in the current reasoning trace uniformly at random and resample the suffix from that position onward. However, reasoning traces typically contain a few consequential decisions (e.g.

Reasoning with Sampling: Cutting at Decision Points 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (5)