Towards Reliable Multilingual LLMs-as-a-Judge: An Empirical Study 文章

ArXiv CS.CL2026-05-28NEWSen作者: Irune Zubiaga, Aitor Soroa, Rodrigo Agerri

摘要

arXiv:2605.28710v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for the automatic evaluation of generated text, yet most prior work focuses on English. Despite the growing demand for multilingual evaluation, extending LLM-based evaluators to multilingual settings remains challenging, particularly for low-resource languages and scenarios where in-domain data is scarce. This work explores several strategies for developing multilingual LLMs-as-a-judge, considering whether in-domain data is available for fine-tuning or not. We systematically analyze English, Spanish, and Basque, representing high-, mid-, and low-resource languages, considering instruction translation, monolingual versus multilingual supervision, and model size. For evaluation, we extend two existing meta-evaluation datasets to Basque and Spanish.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据