What Makes Chain-of-Thought Work at Probe Time? Local Co-occurrence Rather Than Global Derivation 文章

ArXiv CS.AI2026-05-27NEWSen作者: Xiang Wang, Wei Wei

摘要

arXiv:2605.26795v1 Announce Type: new Abstract: Chain-of-thought (CoT) prompting reliably improves language-model accuracy, but which properties of a rationale text drive the improvement is poorly understood. Prior work has largely studied generation-time behavior. We instead ask a probe-time question: given a fixed rationale in context, what in that text changes the answer? We identify two complementary sources of the gain. First, even a globally word-shuffled rationale substantially outperforms the no-rationale baseline, indicating a strong lexical activation effect. More importantly, the additional gain from structured text appears to arise less from sentence-level logical ordering and more from short-range token adjacency. Preserving contiguous windows of just $n^\star{=}2$--$3$ tokens recovers most of the remaining gain toward full CoT performance.