METATR: A Multilingual, Evolving Benchmark for Automatic Text Recognition 文章

ArXiv CS.CV2026-05-27NEWSen作者: M\'elodie Boillet, Sol\`ene Tarride, Christopher Kermorvant

摘要

arXiv:2605.26712v1 Announce Type: new Abstract: Benchmarks that reflect the diversity and complexity of real-world documents are essential for accurately evaluating Automatic Text Recognition (ATR) systems, especially Vision-Large Language Models (vLLMs). Although recent models demonstrate impressive performance, they are often evaluated on datasets containing modern, printed texts mostly written in English, which limits their relevance to many practical applications. Therefore, selecting a model for a specific use case requires evaluating it on data that matches the target documents. This highlights the importance of representative benchmarks for real-world applications. In this paper, we introduce METATR (v1.0), a multilingual, evolving benchmark designed to evaluate ATR models across a wide range of documents, facilitating meaningful model comparison and selection. The benchmark was designed to maximize diversity by including documents from various public collections.