SafeGen-Bench: Benchmarking Safety in Image-Conditioned Text-to-Video Generation 文章

ArXiv CS.CV2026-06-02NEWSen作者: Yingzi Ma, Xiaogeng Liu, Yawen Zheng, Chaowei Xiao

摘要

arXiv:2606.01481v1 Announce Type: new Abstract: With the rapid advancements in text-to-image diffusion models, generative video models (T2V models) like Sora can now produce short synthetic videos from a text prompt or an initial image. However, synthetic video generation -- especially when guided by an initial image -- often poses risks, including the potential creation of illegal, politically sensitive, or unethical content. Existing benchmarks have started to consider the safety of generated videos, but they primarily focus on testing models with malicious text prompts, ignoring the scenario where text prompt and image combination may still lead to harmful video content. In practice, this is a common and challenging issue: videos generated from safe text and image inputs can nonetheless convey harmful information. To bridge this gap, we introduce SafeGen-Bench, a benchmark specifically designed to evaluate the safety of conditional T2V models.