Self-Prophetic Decoding to Unlock Visual Search in LVLMs 文章

ArXiv CS.CV2026-05-28NEWSen作者: Zhendong He, Qiyuan Dai, Guanbin Li, Liang Lin, Sibei Yang

摘要

arXiv:2605.28741v1 Announce Type: new Abstract: Large Vision-Language Models (LVLMs) are rapidly evolving toward true multimodal reasoning, with visual search representing a concrete instantiation of the thinking-with-images paradigm. However, LVLM visual search faces two key challenges: incompatibility among intrinsic capabilities after post-training, and interference in long multi-step reasoning contexts. To address these, we identify two novel insights. First, self-regulation between pre- and post-training LVLMs leverages the intrinsic single-step capabilities of the pre-training model to mitigate capability deterioration and long-context interference. Second, probability-based prophetic sampling, replacing naive prompting, provides a probabilistic interface where the pre-training model acts as a prophet and the post-training model selectively accepts prophetic tokens under its output distribution, preserving coherent multi-step reasoning.

Self-Prophetic Decoding to Unlock Visual Search in LVLMs 文章

摘要

相关事件查看全部 (2)

相关公司

相关人物

相关产品

相关技术查看全部 (4)