FOCUS: Forcing In-Context Object Localization through Visual Support Constraints and Policy Optimization 文章

ArXiv CS.CV2026-06-01NEWSen作者: Mohammed Asad Karim, Vinay Kumar Verma

摘要

arXiv:2605.31145v1 Announce Type: new Abstract: In-context localization (ICL) seeks to localize a target object specified by a small set of support examples in a query image, operating on the fly without training or parameter updates. Despite rapid advances in vision-language models (VLMs), achieving category-agnostic and visually grounded ICL remains an open problem, even though it is essential for applications such as image editing, personalized visual search, and retrieval. Existing methods are fragile and rely on explicit category supervision, which not only limits applicability in realistic settings with unnamed or instance-specific objects but also introduces category bias that steers predictions toward semantic priors rather than visual evidence. We introduce a two-stage training framework that explicitly optimizes in-context attention between support bounding boxes and query images without category supervision.

FOCUS: Forcing In-Context Object Localization through Visual Support Constraints and Policy Optimization 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (2)