LoCoT2V-Bench: Benchmarking Long-Form and Complex Text-to-Video Generation 文章

ArXiv CS.CV2026-05-29NEWSen作者: Xiangqing Zheng, Chengyue Wu, Kehai Chen, Min Zhang

摘要

arXiv:2510.26412v3 Announce Type: replace Abstract: Recent advances in text-to-video generation have achieved impressive performance on short clips, yet evaluating long-form generation under complex textual inputs remains a significant challenge. In response to this challenge, we present LoCoT2V-Bench, a benchmark for long video generation (LVG) featuring multi-scene prompts with hierarchical metadata (e.g., character settings and camera behaviors), constructed from collected real-world videos. We further propose LoCoT2V-Eval, a multi-dimensional framework covering perceptual quality, text-video alignment, temporal quality, dynamic quality, and Human Expectation Realization Degree (HERD), with an emphasis on aspects such as fine-grained text-video alignment and temporal character consistency.

相关公司

暂无数据

相关人物

暂无数据