Comparative Evaluation of Machine Translation Systems on Images with Text 文章

ArXiv CS.CL2026-05-29NEWSen作者: Blai Puchol, Sergio G\'omez Gonz\'alez, Miguel Domingo, Francisco Casacuberta

摘要

arXiv:2605.29476v1 Announce Type: new Abstract: This work presents a comparative evaluation of machine translation systems applied to images containing textual information, a task that lies at the intersection of computer vision and natural language processing. The study compares three main paradigms: modular pipelines that separate text detection, recognition, and translation; multi-modal large language models (MLLMs) capable of processing both image and text jointly; and an end-to-end model, Translatotron-V, which directly generates translated images. The modular systems employ state-of-the-art OCR (docTR) combined with multilingual LLMs such as Llama and EuroLLM, while the evaluated MLLMs include different configurations of Gemini 2.5. Experiments were conducted on parallel multilingual datasets covering multiple language pairs, with evaluation based on BLEU, chrF, and TER metrics.

Comparative Evaluation of Machine Translation Systems on Images with Text 文章

摘要

相关事件查看全部 (2)

相关公司

相关人物

相关产品查看全部 (20)

相关技术查看全部 (8)