DetailMaster: Can Your Text-to-Image Model Handle Long Prompts? 文章

ArXiv CS.CV2026-06-02NEWSen作者: Qirui Jiao, Daoyuan Chen, Yilun Huang, Xika Lin, Ying Shen, Yaliang Li

摘要

arXiv:2505.16915v3 Announce Type: replace Abstract: While recent Text-to-Image (T2I) models show impressive capabilities in synthesizing images from brief descriptions, they struggle with the long, detailed prompts required for professional applications. We present DetailMaster, a comprehensive benchmark for evaluating T2I capabilities on long prompts with complex compositional requirements, accompanied by an automated data construction pipeline and an evaluation workflow. Comprising expert-validated prompts averaging 284.89 tokens, our benchmark introduces four critical evaluation dimensions: Character Attributes, Structured Character Locations, Multi-Dimensional Scene Attributes, and Spatial/Interactive Relationships.

相关公司

暂无数据

相关人物

暂无数据

相关技术

暂无数据