Physical Plausibility Reasoning via HCM-GRPO: Empowering Compact Model for Superior Performance 文章

ArXiv CS.CV2026-06-03NEWSen作者: Zhiyuan Hu, Zheng Sun, Yi Wei, Long Yu

摘要

arXiv:2511.10055v2 Announce Type: replace Abstract: The performance of image generation has been significantly improved in recent years. However, the study of image screening is rare, and its performance with Multimodal Large Language Models (MLLMs) is unsatisfactory due to the lack of data and the weak physical plausibility reasoning ability in MLLMs. In this work, we propose a complete solution to address these problems in terms of data and methodology. For data, we collect a comprehensive image screening dataset with over 128k samples, comprising about 640k images. Each sample consists of an original image and four generated images. The dataset evaluates the physical plausibility reasoning ability under four aspects: appearance deformation, physical shadow, placement layout, and extension rationality.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据