ReGuLaR: Relation-Grounded Latent Reasoning for Large Vision-Language Models 文章

ArXiv CS.CV2026-06-01NEWSen作者: Zihu Wang, Karthik Somayaji N. S, Peng Li

摘要

arXiv:2605.30587v1 Announce Type: new Abstract: Chain-of-thought (CoT) reasoning has significantly improved the reasoning ability of large vision-language models (LVLMs) by verbalizing intermediate reasoning steps in natural language. However, such discrete textual rationales are often insufficient for encoding continuous visual evidence. Recent work addresses this limitation by moving reasoning into continuous latent space. Despite promising progress, existing methods leave latent reasoning insufficiently connected to the compositional and relational structure of visual evidence. To address this gap, we introduce ReGuLaR, a relation grounded latent reasoning framework that explicitly grounds latent states in these critical yet overlooked visual evidence. ReGuLaR uses a training-time ReGFormer to focus latent reasoning on question-relevant objects and inter-object relations, while at inference time the model reasons and generates answers without invoking the ReGFormer.

ReGuLaR: Relation-Grounded Latent Reasoning for Large Vision-Language Models 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (1)

相关技术查看全部 (1)