T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation 事件

PRODUCT_LAUNCH2026-06-03影响: MEDIUM

T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation arXiv:2512.21094v2 Announce Type: replace Abstract: Text-to-Audio-Video (T2AV) generation aims to synthesize temporally coherent video and semantically synchronized audio from natural language, yet its evaluation remains fragmented, often relying on unimodal metrics or narrowly scoped benchmarks that fail to capture cross-modal alignment, instruction following, and perceptual realism under complex prompts. To address th

T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation · 相关报道