Blink: Dynamic Visual Token Resolution for Enhanced Multimodal Understanding 事件

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

Blink: Dynamic Visual Token Resolution for Enhanced Multimodal Understanding arXiv:2512.10548v3 Announce Type: replace Abstract: Multimodal large language models (MLLMs) have achieved remarkable progress on various vision-language tasks, yet their visual perception remains limited. Humans, in comparison, perceive complex scenes efficiently by dynamically scanning and focusing on salient regions in a sequential "blink-like" process. Motivated by this strategy, we first investigate whether MLLMs