CIVIC: End-to-End Sequence Compactness for Efficient Vision-Language Models 文章

ArXiv CS.AI2026-05-28NEWSen作者: Fengze Yang, Bo Yu, Xuewen Luo, Cathy Liu, Chenxi Liu

摘要

arXiv:2605.28115v1 Announce Type: new Abstract: Vision-Language Models (VLMs) face severe memory and latency bottlenecks due to high-resolution visual tokens. While current token reduction methods theoretically save FLOPs, post-hoc pruning introduces structural overhead, failing to yield proportional wall-clock acceleration. However, enforcing a contiguous compact pathway risks geometric disorientation and loss of fine-grained localization. To overcome these barriers, this paper introduces CIVIC, a path-consistent compact visual inference framework. By maintaining compact sequence representations seamlessly across the vision encoder, projection layer, LLM prefill, and KV-cache, CIVIC avoids non-contiguous memory access and localized unmerging overheads.

CIVIC: End-to-End Sequence Compactness for Efficient Vision-Language Models 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (3)