Evaluating Reasoning Fidelity in Visual Text Generation 文章

ArXiv CS.CV2026-06-04NEWSen作者: Jiajun Hong, Jiawei Zhou

摘要

arXiv:2606.04479v1 Announce Type: new Abstract: Recent text-to-image (T2I) models can render highly legible and well-structured text within images, enabling applications including document generation and slide generation. However, it remains unclear whether such systems faithfully preserve reasoning ability when complex solutions must be expressed directly through rendered text, or whether they merely imitate surface-level patterns. We investigate this question by evaluating reasoning fidelity in visual text generation, where models must express complete reasoning processes as images. Our evaluation includes long text rendering, factual knowledge probing, context understanding, and multi-step reasoning. Across these settings, we find that current T2I models frequently produce semantic errors, logical inconsistencies, and incorrect intermediate steps, even when the rendered text appears visually clear.

相关事件查看全部 (1)

Evaluating Reasoning Fidelity in Visual Text Generation
2026-06-04PRODUCT_LAUNCH影响: MEDIUM

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据

相关技术

暂无数据