Blink: Dynamic Visual Token Resolution for Enhanced Multimodal Understanding 事件
PRODUCT_LAUNCH2026-05-26影响: MEDIUM
Blink: Dynamic Visual Token Resolution for Enhanced Multimodal Understanding arXiv:2512.10548v3 Announce Type: replace Abstract: Multimodal large language models (MLLMs) have achieved remarkable progress on various vision-language tasks, yet their visual perception remains limited. Humans, in comparison, perceive complex scenes efficiently by dynamically scanning and focusing on salient regions in a sequential "blink-like" process. Motivated by this strategy, we first investigate whether MLLMs
相关产品查看全部 (10)
相关报道查看全部 (1)
Blink: Dynamic Visual Token Resolution for Enhanced Multimodal Understanding
ArXiv CS.CV2026-05-26