OneFocus: Enabling Real-World X-ray Security Screening with a Unified Vision-Language Model 文章

ArXiv CS.CV2026-06-16NEWSen作者: Jiali Wen, Hongxia Gao, Litao Li, Yixin Chen, Kaijie Zhang, Qianyun Liu, Xiaoqin Wen

详细信息

来源站点: ArXiv CS.CV
作者: Jiali Wen, Hongxia Gao, Litao Li, Yixin Chen, Kaijie Zhang, Qianyun Liu, Xiaoqin Wen
文章类型: NEWS
语言: en
发布日期: 2026-06-16

摘要

arXiv:2606.15663v1 Announce Type: new Abstract: X-ray contraband detection is critical for security in large-scale logistics and transportation, yet conventional detectors struggle to adapt to emerging contraband types and lack fundamental visual understanding. Vision-language models (VLMs) offer strong generalization but are hindered by the scarcity of high-quality X-ray image-caption data. To bridge this critical gap, we present MMXray, a meticulously curated benchmark of 52,124 image-caption pairs spanning 28 fine-grained classes of X-ray contraband. To enrich MMXray with realistic occlusion patterns, we further introduce CleanDET, a dedicated synthesis dataset containing clean foreground contraband images from 28 categories and background images with diverse density levels, together with AnyContraSyn, a controllable synthesis method designed to operate on CleanDET. We also develop OnePipe, an extensible pipeline for systematic data curation.

OneFocus: Enabling Real-World X-ray Security Screening with a Unified Vision-Language Model 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (7)

相关技术查看全部 (4)