详细信息
- 来源站点
- ArXiv CS.CV
- 作者
- Haiquan Wen, Tianxiao Li, Zhenglin Huang, Yiwei He, Guangliang Cheng
- 文章类型
- NEWS
- 语言
- en
- 发布日期
- 2026-06-17
摘要
arXiv:2507.14632v4 Announce Type: replace Abstract: The rapid advancement of generative AI has substantially improved image and video synthesis, amplifying the risk of multimodal visual misinformation. Recent MLLMs have shown promise for transparent AI-generated content detection through reasoning and explanation, yet existing approaches largely treat image and video forensics as isolated tasks, leaving cross-modal synergies underexplored. To address this, we present \textbf{BusterX++}, a unified MLLM for joint image and video detection with interpretable reasoning. We also introduce \textbf{GenBuster-Bench++}, a meticulously curated, difficulty-aligned benchmark containing balanced image and video samples spanning recent generation models and diverse real-world scenarios. Using this controlled setting, we revisit the widely adopted $SFT \rightarrow RL$ post-training paradigm.
相关事件
暂无数据
相关公司
暂无数据
相关人物
暂无数据