Self-Improving Small Object Grounding in LVLMs 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

Self-Improving Small Object Grounding in LVLMs arXiv:2606.01612v1 Announce Type: new Abstract: Can internal attention patterns in Large Vision Language Models (LVLMs) identify reliable small-object boxes without fine-tuning? In this work, we provide an affirmative answer. Attention structure in LVLMs encodes grounding quality-a lightweight IoU regressor trained solely on attention maps achieves strong IoU prediction (Pearson r > 0.67). This regressor powers the regressor-based variant of our At