Evaluating Reasoning Fidelity in Visual Text Generation 文章

ArXiv CS.CV2026-06-04NEWSen作者: Jiajun Hong, Jiawei Zhou

摘要

arXiv:2606.04479v1 Announce Type: new Abstract: Recent text-to-image (T2I) models can render highly legible and well-structured text within images, enabling applications including document generation and slide generation. However, it remains unclear whether such systems faithfully preserve reasoning ability when complex solutions must be expressed directly through rendered text, or whether they merely imitate surface-level patterns. We investigate this question by evaluating reasoning fidelity in visual text generation, where models must express complete reasoning processes as images. Our evaluation includes long text rendering, factual knowledge probing, context understanding, and multi-step reasoning. Across these settings, we find that current T2I models frequently produce semantic errors, logical inconsistencies, and incorrect intermediate steps, even when the rendered text appears visually clear.

Evaluating Reasoning Fidelity in Visual Text Generation 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术