Focus-then-Context: Subject-Centric Progressive Visual Token Reduction for Vision-Language Models 事件

PRODUCT_LAUNCH2026-06-08影响: MEDIUM

Focus-then-Context: Subject-Centric Progressive Visual Token Reduction for Vision-Language Models arXiv:2605.20950v2 Announce Type: replace Abstract: Vision-Language Models (VLMs) face a bottleneck of prohibitive computational costs arising from massive visual token sequences during inference. Existing vision token reduction methods alleviate this burden, but they unintentionally preserve the isolated visual subject strictly aligned with the user's query, which fails to substantially explore sa