LV-OSD: Language-Vision-Complementary Open-Set Object Detection 文章

ArXiv CS.CV2026-05-28NEWSen作者: Yupeng Zhang, Ruize Han, Wei Feng, Song Wang, Liang Wan

摘要

arXiv:2605.28271v1 Announce Type: new Abstract: Object detection is an important task in computer vision, which aims to detect the objects of interest. through the given category list or query images. In this work, we propose a new problem of language-visual-complementary open-set object detection (LV-OSD), i.e., using the flexible text-based and/or image-based prompts to specify the desired object categories. This setting is more common and practical in real-world applications. For this purpose, we design a dual-branch detection framework, LVDor, which can simultaneously accept both text and image prompts. Specifically, we first build the Multi-modal Prompts (MPr) containing various text descriptions and image samples for each category. Subsequently, to bridge the semantic gap among the input image, text prompts, and image prompts, we design a Target-guided Prompt Dynamic Weighting (TPDW) module.