HiDe: Rethinking The Zoom-IN method in High Resolution MLLMs via Hierarchical Decoupling 事件

PRODUCT_LAUNCH2026-06-05影响: MEDIUM

HiDe: Rethinking The Zoom-IN method in High Resolution MLLMs via Hierarchical Decoupling arXiv:2510.00054v3 Announce Type: replace Abstract: Multimodal Large Language Models (MLLMs) have made significant strides in visual understanding tasks. However, their performance on high-resolution images remains suboptimal. While existing approaches often attribute this limitation to perceptual constraints and argue that MLLMs struggle to recognize small objects, leading them to use "zoom in" strategies

HiDe: Rethinking The Zoom-IN method in High Resolution MLLMs via Hierarchical Decoupling · 相关报道