HiDe: Rethinking The Zoom-IN method in High Resolution MLLMs via Hierarchical Decoupling 事件

PRODUCT_LAUNCH2026-06-05影响: MEDIUM

HiDe: Rethinking The Zoom-IN method in High Resolution MLLMs via Hierarchical Decoupling arXiv:2510.00054v3 Announce Type: replace Abstract: Multimodal Large Language Models (MLLMs) have made significant strides in visual understanding tasks. However, their performance on high-resolution images remains suboptimal. While existing approaches often attribute this limitation to perceptual constraints and argue that MLLMs struggle to recognize small objects, leading them to use "zoom in" strategies