CIVIC: End-to-End Sequence Compactness for Efficient Vision-Language Models 文章

ArXiv CS.AI2026-05-28NEWSen作者: Fengze Yang, Bo Yu, Xuewen Luo, Cathy Liu, Chenxi Liu

摘要

arXiv:2605.28115v1 Announce Type: new Abstract: Vision-Language Models (VLMs) face severe memory and latency bottlenecks due to high-resolution visual tokens. While current token reduction methods theoretically save FLOPs, post-hoc pruning introduces structural overhead, failing to yield proportional wall-clock acceleration. However, enforcing a contiguous compact pathway risks geometric disorientation and loss of fine-grained localization. To overcome these barriers, this paper introduces CIVIC, a path-consistent compact visual inference framework. By maintaining compact sequence representations seamlessly across the vision encoder, projection layer, LLM prefill, and KV-cache, CIVIC avoids non-contiguous memory access and localized unmerging overheads.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据