LookWise: Knowing When and Where to Look for Fine-Grained Visual Reasoning in Multimodal Large Language Models 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

LookWise: Knowing When and Where to Look for Fine-Grained Visual Reasoning in Multimodal Large Language Models arXiv:2603.00171v3 Announce Type: replace Abstract: Multimodal Large Language Models (MLLMs) are shifting towards "Thinking with Images" by actively exploring image details. While effective, large-scale training is computationally expensive, which has spurred growing interest in lightweight, training-free solutions. However, existing training-free methods suffer from two flaws: percept