Vision Inference Former: Sustaining Visual Consistency in Multimodal Large Language Models 文章

ArXiv CS.CV2026-06-03NEWSen作者: Xinpeng Dong, Min Zhang, Kairong Han, Xu Tan, Fei Wu, Kun Kuang

Vision Inference Former: Sustaining Visual Consistency in Multimodal Large Language Models · 相关技术