Can Retrieval Heads See Images? Multimodal Retrieval Heads in Long-Context Vision-Language Models 文章
ArXiv CS.CV2026-05-27NEWSen作者: Aaron Branson Cigres Li, Zhaowei Wang, Yu Zhao, Yiming Du, Haobo Li, Xiyu Ren, Ginny Wong, Simon See, Lishu Luo, Haodong Duan, Pasquale Minervini, Yangqiu Song